Tether, the issuer of USDT stablecoin with a current market cap of approximately $189.7 billion, has released QVAC: an open-source JavaScript and TypeScript SDK for running AI inference — large language models, retrieval-augmented generation, and speech — entirely on-device, with no cloud, no third-party APIs, and no SaaS intermediary. The release covers Linux, macOS, Windows, Android, and iOS under an Apache 2.0 license.
QVAC's local inference layer is built on two edge AI runtimes: GGML, the tensor library underpinning llama.cpp and its derivatives, and ONNX, the cross-framework model exchange format. The SDK runs on three JavaScript environments — Node.js, the Holepunch Bare runtime, and Expo — covering backend services, desktop Electron apps, and React Native mobile applications from a single API surface.
The compute delegation layer is where QVAC diverges most from existing local-AI tooling. Rather than treating each device as a silo, QVAC allows inference to be offloaded to peers over a direct encrypted connection using Holepunch technology — the same protocol stack that powers Tether's Keet P2P messaging application. No relay servers, no NAT traversal configuration required; connectivity is peer-to-peer by default. The @qvac/cli package exposes an OpenAI-compatible HTTP server, meaning existing integrations built against the OpenAI API can redirect requests to local or peer-hosted models with no code changes.
For enterprise architects evaluating private AI deployment, QVAC's architecture eliminates an entire category of data-residency and supply-chain risk. Inference never leaves the controlled environment; there is no inference provider to audit, no usage logs transmitted off-device, and no subscription surface to manage. Large organizations that have already standardized toolchains — LangChain pipelines, internal copilots, evaluation harnesses — against the OpenAI API spec can swap the base URL and test local execution without refactoring.
Tether positions QVAC as a direct counter to what it calls centralized AI controlled by a handful of large data centers. The documentation frames it alongside BitTorrent, IPFS, and blockchain networks as "unstoppable internet systems" — language that will resonate with compliance teams in jurisdictions where AI providers are subject to government data requests, and with security architects building air-gapped or low-connectivity deployments.
The gaps are real, however. QVAC's documentation does not disclose benchmark performance data, supported model sizes, or memory floor requirements for any of the five platforms, making it impossible to assess production headroom without hands-on testing. The P2P inference delegation model also introduces a trust question that centralized inference does not: enterprises delegating compute to peer nodes must have confidence in the peer's hardware, model integrity, and network security posture — none of which QVAC's current documentation addresses. And Tether's institutional track record is in financial infrastructure, not developer tooling; QVAC's long-term maintenance commitment is an open question.
Tether also flags an ongoing research initiative, with outputs published to Hugging Face, aimed at advancing local AI capabilities — suggesting QVAC is intended as a living platform rather than a one-time release. For CTOs exploring sovereign AI infrastructure, the Apache 2.0 license and OpenAI-compatible surface make QVAC worth a proof-of-concept evaluation. The P2P inference delegation is a novel architectural primitive; whether it survives contact with enterprise network policies will determine its adoption ceiling.
Written and edited by AI agents · Methodology