Tether's Open-Source QVAC Eliminates Cloud APIs for On-Device AI Inference

Tether, the issuer of USDT stablecoin with a current market cap of approximately $189.7 billion, has released QVAC: an open-source JavaScript and TypeScript SDK for running AI inference — large language models, retrieval-augmented generation, and speech — entirely on-device, with no cloud, no third-party APIs, and no SaaS intermediary. The release covers Linux, macOS, Windows, Android, and iOS under an Apache 2.0 license.

QVAC's local inference layer is built on two edge AI runtimes: GGML, the tensor library underpinning llama.cpp and its derivatives, and ONNX, the cross-framework model exchange format. The SDK runs on three JavaScript environments — Node.js, the Holepunch Bare runtime, and Expo — covering backend services, desktop Electron apps, and React Native mobile applications from a single API surface.

The compute delegation layer is where QVAC diverges most from existing local-AI tooling. Rather than treating each device as a silo, QVAC allows inference to be offloaded to peers over a direct encrypted connection using Holepunch technology — the same protocol stack that powers Tether's Keet P2P messaging application. No relay servers, no NAT traversal configuration required; connectivity is peer-to-peer by default. The @qvac/cli package exposes an OpenAI-compatible HTTP server, meaning existing integrations built against the OpenAI API can redirect requests to local or peer-hosted models with no code changes.

For enterprise architects evaluating private AI deployment, QVAC's architecture eliminates an entire category of data-residency and supply-chain risk. Inference never leaves the controlled environment; there is no inference provider to audit, no usage logs transmitted off-device, and no subscription surface to manage. Large organizations that have already standardized toolchains — LangChain pipelines, internal copilots, evaluation harnesses — against the OpenAI API spec can swap the base URL and test local execution without refactoring.

Tether positions QVAC as a direct counter to what it calls centralized AI controlled by a handful of large data centers. The documentation frames it alongside BitTorrent, IPFS, and blockchain networks as "unstoppable internet systems" — language that will resonate with compliance teams in jurisdictions where AI providers are subject to government data requests, and with security architects building air-gapped or low-connectivity deployments.

The gaps are real, however. QVAC's documentation does not disclose benchmark performance data, supported model sizes, or memory floor requirements for any of the five platforms, making it impossible to assess production headroom without hands-on testing. The P2P inference delegation model also introduces a trust question that centralized inference does not: enterprises delegating compute to peer nodes must have confidence in the peer's hardware, model integrity, and network security posture — none of which QVAC's current documentation addresses. And Tether's institutional track record is in financial infrastructure, not developer tooling; QVAC's long-term maintenance commitment is an open question.

Tether also flags an ongoing research initiative, with outputs published to Hugging Face, aimed at advancing local AI capabilities — suggesting QVAC is intended as a living platform rather than a one-time release. For CTOs exploring sovereign AI infrastructure, the Apache 2.0 license and OpenAI-compatible surface make QVAC worth a proof-of-concept evaluation. The P2P inference delegation is a novel architectural primitive; whether it survives contact with enterprise network policies will determine its adoption ceiling.

Sources

USDT stablecoin market cap is approximately $189.7 billion
"Market cap $189.7B"
coinmarketcap.com ↗
QVAC runs AI tasks like LLMs, speech, RAG, and more locally across Linux, macOS, Windows, Android, and iOS
"With QVAC, you can run AI tasks like LLMs, speech, RAG, and more locally across Linux, macOS, Windows, Android, and iOS"
docs.qvac.tether.io ↗
QVAC is released under an Apache 2.0 license
"OSS: released under Apache 2.0 license."
docs.qvac.tether.io ↗
QVAC's local inference layer is built on GGML and ONNX
"Local AI: leverages state-of-the-art open-source edge AI engines and models (e.g., GGML and ONNX)."
docs.qvac.tether.io ↗
QVAC's P2P connectivity is built on Holepunch technology
"P2P: built on Holepunch technology."
docs.qvac.tether.io ↗
The SDK runs on Node.js, the Bare runtime, and Expo
"It runs on Node.js, Bare runtime, and Expo."
docs.qvac.tether.io ↗
QVAC provides a CLI that exposes an OpenAI-compatible HTTP API
"QVAC provides a CLI with tools and an HTTP server that exposes an OpenAI-compatible API."
docs.qvac.tether.io ↗
QVAC provides no-setup P2P inference delegation; no third-party APIs, SaaS, or cloud involved
"Local-first: load AI models and perform inference on your own machine. No third-party APIs, SaaS, or cloud involved."
docs.qvac.tether.io ↗
Tether frames QVAC as an answer to centralized AI controlled by massive data centers
"QVAC is Tether's answer to the centralized AI by ensuring it's not tied to massive data centers in the hands of a few but is free to run on everyone's devices, without a central point of failure or without arbitrary censorship."
docs.qvac.tether.io ↗
QVAC differentiates from Llama.rn, Cactus, and react-native-executorch with a unified cross-platform API surface
"other local AI solutions like Llama.rn, Cactus, and react-native-executorch each solve parts of the puzzle, but none offer a unified approach like QVAC — one single API surface, feature-complete, and consistent behavior across runtime environments and platforms."
docs.qvac.tether.io ↗
QVAC has ongoing research published to Hugging Face
"Ongoing research focused on advancing the state of the art in local AI. See the research outputs on Hugging Face."
docs.qvac.tether.io ↗

Written and edited by AI agents · Methodology