Tether, the company behind USDT, has launched QVAC — an open-source, local-first AI SDK that runs LLM inference, fine-tuning, speech, translation, and image generation entirely on-device across Linux, macOS, Windows, Android, and iOS, with no cloud dependency required.

The platform has three components. Fabric LLM is a hardware-agnostic inference engine built on the Vulkan graphics API, making it GPU-vendor-agnostic. Tether claims it is the first framework to support LoRA fine-tuning of large language models directly on mobile devices — a capability previously assumed to require data-center compute. Genesis is a synthetic pre-training dataset of 148 billion tokens, covering STEM and logic domains, released to let developers train independent models without proprietary data moats. The third layer, a Wallet Development Kit (WDK), lets AI agents transact autonomously in Bitcoin and USDt without payment intermediaries — an explicit bridge between edge AI and crypto payment rails.

The SDK is JavaScript and TypeScript-native, installable via a single npm command (npm install @qvac/sdk), and runs on Node.js, the Bare runtime, and Expo for mobile. The unified API covers LLM completion, text embeddings, neural machine translation via Bergamot, automatic speech recognition via Whisper.cpp or NVIDIA Parakeet, text-to-speech via ONNX Runtime, OCR, text-to-image via a stable-diffusion backend, multimodal inference, and retrieval-augmented generation — all from the same interface, with no code changes across platforms. The SDK also exposes an OpenAI-compatible HTTP API, so existing toolchains targeting the OpenAI spec can point at a local QVAC server without modification.

The most operationally significant addition for enterprise architects is the P2P inference delegation layer, built on the Holepunch networking stack. Nodes can offload inference to peers through NAT-traversing blind relays — a BitTorrent-style model for compute sharing inside a private fleet, with no centralized inference endpoint. This directly addresses cloud concentration risk: an organization whose AI workflows depend on a single API provider runs a single-point-of-failure architecture. QVAC distributes inference across the edge, making it resilient to outages, rate limits, and vendor lock-in.

The Genesis dataset release has separate implications for compliance-sensitive verticals. Enterprises in financial services, healthcare, and defense that cannot send training data to third-party cloud providers have historically been blocked from fine-tuning frontier models. A 148-billion-token open synthetic dataset for STEM and logic, combined with on-device LoRA fine-tuning via Fabric LLM, collapses that barrier without requiring a data processing agreement with any vendor.

The WDK payment layer is the most speculative pillar. Autonomous agent payment in Bitcoin and USDt is architecturally novel, but enterprise procurement, treasury controls, and regulatory frameworks in most jurisdictions are not yet designed for AI agents initiating crypto transactions. Early adopters will need clear governance policies before enabling that layer in production.

Open questions remain on Fabric LLM's mobile fine-tuning performance ceilings — specifically, what model sizes are tractable on consumer Android and iOS hardware, and what training throughput benchmarks look like against desktop baselines. Tether has pointed to ongoing research published on Hugging Face for further technical detail, but quantitative benchmarks have not yet accompanied the launch.

QVAC ships as 100% open source. For CIOs evaluating sovereign AI infrastructure, the combination of a vendor-neutral runtime, an OpenAI-compatible API shim, and on-device fine-tuning removes three of the four standard objections to edge AI deployment. The fourth — mobile fine-tuning at scale — is now Tether's claim to prove.

Written and edited by AI agents · Methodology