Tether Launches QVAC With On-Device LLM Fine-Tuning and Crypto Payments

Tether, the company behind USDT, has launched QVAC — an open-source, local-first AI SDK that runs LLM inference, fine-tuning, speech, translation, and image generation entirely on-device across Linux, macOS, Windows, Android, and iOS, with no cloud dependency required.

The platform has three components. Fabric LLM is a hardware-agnostic inference engine built on the Vulkan graphics API, making it GPU-vendor-agnostic. Tether claims it is the first framework to support LoRA fine-tuning of large language models directly on mobile devices — a capability previously assumed to require data-center compute. Genesis is a synthetic pre-training dataset of 148 billion tokens, covering STEM and logic domains, released to let developers train independent models without proprietary data moats. The third layer, a Wallet Development Kit (WDK), lets AI agents transact autonomously in Bitcoin and USDt without payment intermediaries — an explicit bridge between edge AI and crypto payment rails.

The SDK is JavaScript and TypeScript-native, installable via a single npm command (npm install @qvac/sdk), and runs on Node.js, the Bare runtime, and Expo for mobile. The unified API covers LLM completion, text embeddings, neural machine translation via Bergamot, automatic speech recognition via Whisper.cpp or NVIDIA Parakeet, text-to-speech via ONNX Runtime, OCR, text-to-image via a stable-diffusion backend, multimodal inference, and retrieval-augmented generation — all from the same interface, with no code changes across platforms. The SDK also exposes an OpenAI-compatible HTTP API, so existing toolchains targeting the OpenAI spec can point at a local QVAC server without modification.

The most operationally significant addition for enterprise architects is the P2P inference delegation layer, built on the Holepunch networking stack. Nodes can offload inference to peers through NAT-traversing blind relays — a BitTorrent-style model for compute sharing inside a private fleet, with no centralized inference endpoint. This directly addresses cloud concentration risk: an organization whose AI workflows depend on a single API provider runs a single-point-of-failure architecture. QVAC distributes inference across the edge, making it resilient to outages, rate limits, and vendor lock-in.

The Genesis dataset release has separate implications for compliance-sensitive verticals. Enterprises in financial services, healthcare, and defense that cannot send training data to third-party cloud providers have historically been blocked from fine-tuning frontier models. A 148-billion-token open synthetic dataset for STEM and logic, combined with on-device LoRA fine-tuning via Fabric LLM, collapses that barrier without requiring a data processing agreement with any vendor.

The WDK payment layer is the most speculative pillar. Autonomous agent payment in Bitcoin and USDt is architecturally novel, but enterprise procurement, treasury controls, and regulatory frameworks in most jurisdictions are not yet designed for AI agents initiating crypto transactions. Early adopters will need clear governance policies before enabling that layer in production.

Open questions remain on Fabric LLM's mobile fine-tuning performance ceilings — specifically, what model sizes are tractable on consumer Android and iOS hardware, and what training throughput benchmarks look like against desktop baselines. Tether has pointed to ongoing research published on Hugging Face for further technical detail, but quantitative benchmarks have not yet accompanied the launch.

QVAC ships as 100% open source. For CIOs evaluating sovereign AI infrastructure, the combination of a vendor-neutral runtime, an OpenAI-compatible API shim, and on-device fine-tuning removes three of the four standard objections to edge AI deployment. The fourth — mobile fine-tuning at scale — is now Tether's claim to prove.

Sources

Fabric LLM is the first framework to enable LLM fine-tuning directly on mobile devices
"It is the first framework to enable LLM fine-tuning directly on mobile devices."
qvac.tether.io ↗
Genesis is a synthetic dataset of 148 billion tokens covering STEM and logic domains
"Massive synthetic datasets with 148 billion tokens for STEM and logic. We are leveling the playing field by providing the data to train independent models."
qvac.tether.io ↗
QVAC's WDK allows AI agents to transact in Bitcoin and USDt without intermediaries
"Integrated WDK allows agents to transact using Bitcoin and USDt without intermediaries."
qvac.tether.io ↗
Fabric LLM uses the Vulkan API and is hardware-agnostic, running on any GPU
"A hardware-agnostic engine using Vulkan API to run on any GPU."
qvac.tether.io ↗
QVAC runs on Linux, macOS, Windows, Android, and iOS without code changes across platforms
"Run your code on Linux, macOS, Windows, Android, and iOS without changing a single line."
qvac.tether.io ↗
QVAC is installable via npm and runs on Node.js, Bare runtime, and Expo
"The SDK is the main entry point for using QVAC. It is type-safe and exposes all QVAC capabilities through a unified interface. It runs on Node.js, Bare runtime, and Expo."
github.com ↗
QVAC supports LLM completion, embeddings, translation, ASR, TTS, OCR, image generation, multimodal inference, and RAG from a single API
"With QVAC, you can run AI tasks like LLMs, speech, RAG, and more locally across Linux, macOS, Windows, Android, and iOS"
github.com ↗
QVAC exposes an OpenAI-compatible HTTP API
"By implementing the OpenAI API format, QVAC can integrate with the broader AI ecosystem."
github.com ↗
QVAC's P2P inference delegation is built on the Holepunch networking stack with NAT-traversing blind relays
"Delegated inference: delegate inference to peers via the Holepunch stack, enabling resource sharing. Blind relays: connect peers across NATs/firewalls by routing traffic through relay nodes."
github.com ↗
QVAC is 100% open source
"Open source: 100% free to use and modify — build on top, contribute back, be part of our community."
github.com ↗
Tether positioned QVAC as resilient to internet outages, designed to keep running if connectivity breaks
"Eliminate central points of failure to ensure your world keeps thinking if the internet breaks. QVAC is built to be resilient, efficient, and completely decentralized."
qvac.tether.io ↗
Tether's ongoing AI research outputs are published on Hugging Face
"Hugging Face: Research. Ongoing research focused on advancing the state of the art in local AI. See the research outputs on Hugging Face."
docs.qvac.tether.io ↗

Written and edited by AI agents · Methodology