Mistral has not released a 30B parameter mixture-of-depths code model, nor has there been any announcement, API catalog, or weight repository to confirm its existence. The speculation is noteworthy as it would fill a logical gap in Mistral's production-hardened code stack, which includes Codestral for autocomplete, Devstral 2 for agents, Devstral Small 2 for local inference, and the generalist Mistral Large 3.

The current lineup is architecturally diverse. Codestral is a 22B dense model optimized for fill-in-the-middle completion, featuring a 256K context window in its current 25.XX API release, priced at $0.30 per million input tokens and $0.90 per million output tokens on Mistral's dedicated endpoint. It achieves 95.3% pass@1 on FIM benchmarks and is cost-effective for per-keystroke calls. Devstral 2 is a 123B dense transformer designed for agentic coding, scoring 72.2% on SWE-Bench Verified, with API pricing at $0.40/$2.00 per million tokens. Devstral Small 2 is a 24B Apache 2.0 model that operates on a single RTX 4090 or a MacBook M-series Pro/Max for air-gapped work. Mistral Large 3 is a sparse mixture-of-experts model with 41B active parameters drawn from 675B total, also Apache 2.0, trained on approximately 3,000 H200 GPUs.

A hypothetical 30B active-parameter mixture-of-depths model would bridge the gap between Codestral and Large 3, relying on per-token routing decisions rather than sparse expert selection. Unlike MoE, which dispatches tokens to different feed-forward networks, mixture-of-depths routes tokens to different layer depths, skipping later layers when an intermediate confidence threshold is met. This approach reduces the average forward-pass cost below that of a dense 30B model while maintaining peak capacity for complex tokens. However, it introduces operational complexity as dynamic depth routing disrupts static batching, KV-cache sizing, and throughput benchmarking on standard vLLM or Triton stacks, as each token in a batch may exit at a different layer. Memory bandwidth savings are only realized if the inference engine can handle early exits without padding the entire batch to full depth.

Mistral's code model stack: confirmed dense and sparse tiers with hypothetical 30B mixture-of-depths in the gap.
FIG. 02 Mistral's code model stack: confirmed dense and sparse tiers with hypothetical 30B mixture-of-depths in the gap.

In the absence of Mistral publishing weights, an endpoint, or evaluations for such a model, the 30B MoD claim remains speculative. The existing family already illustrates the trade-offs architects face. Codestral excels in latency and price but lacks the reasoning depth for multi-file refactoring. Devstral 2 manages this at roughly 3–6× the token cost depending on output length. Devstral Small 2 offers offline inference with 24B-scale accuracy. No confirmed MoD option yet provides variable compute cost at code-model quality.

Adopt the tiering strategy now, not the unconfirmed routing mechanism: use a cheap 22B endpoint for autocomplete, route complex agentic tasks to a 123B API, and maintain a 24B Apache 2.0 checkpoint on local hardware for pre-commit or offline generation. If a MoD model emerges, the key question will be whether its dynamic compute savings offset the overhead of custom CUDA kernels and uneven batch execution.