Mistral Medium 3.5 Brings Coding Agents to On-Premises Deployments

Mistral has shipped Mistral Medium 3.5, a 128-billion parameter model, alongside remote coding agents in Mistral Vibe and a new Work Mode in Le Chat. The release positions the company as a provider of open-weights agentic infrastructure for enterprise use.

The model is available in public preview under a modified MIT license with open weights. It supports a 256k-token context window and can run on a small number of GPUs for self-hosted deployments. Configurable reasoning effort per request lets operators tune latency versus depth: short, direct responses for simple queries and extended multi-step chains for complex workflows. A vision encoder handles variable image inputs natively. The architecture targets instruction following, reasoning, and coding within a single system.

On the agent execution side, Mistral Vibe now runs coding sessions in cloud-based runtimes rather than local environments. Sessions start from a CLI or from within Le Chat and execute asynchronously. State and history migrate intact from local to cloud. Multiple agents run in parallel within isolated environments, where each agent can modify code, install dependencies, and call external systems. On task completion, agents can generate pull requests and surface notifications for human review—a handoff pattern consistent with enterprise CI/CD pipelines.

Le Chat's Work Mode extends orchestration beyond coding. An agent executes multi-step workflows across connected tools—currently GitHub, Jira, and Slack—with full visibility into intermediate steps and tool calls. Sensitive operations require explicit user approval before execution. Sessions persist across steps, allowing iterative refinement until a task meets completion criteria.

For open-weights deployments, the release addresses two structural objections: orchestration capability and infrastructure maturity. The asynchronous, cloud-hosted execution model matches the operational profile of proprietary alternatives like OpenAI Codex and Claude Code. The self-hosting path under an open license preserves the option to run inference on-premises—a compliance and cost consideration for regulated industries. Mistral Medium 3.5 is now the default model in the Vibe CLI, replacing prior models and unifying the agent runtime on a single, current foundation.

Community response has focused on two pressure points. Developers testing early builds noted improvements over the previous DevStral model, particularly for tasks involving Helm templates, GitLab pipelines, and end-to-end test generation. On pricing, some users flagged the API cost—$1.50 per million input tokens and $7.50 per million output tokens—as steep relative to Gemini Flash and comparable-tier models.

The open-weights positioning is the strategic focus: shipping agent infrastructure—cloud runtimes, tool integrations, async orchestration—on top of a licensable, self-hostable model targets the enterprise segment that will not route sensitive workloads through a third-party API. Whether the architecture can close the tooling and ecosystem gap with established agent platforms will determine uptake.

Sources

Mistral Medium 3.5 is a 128-billion parameter model
"Mistral has released Mistral Medium 3.5, a 128-billion parameter model designed to handle instruction following, reasoning, and coding within a single system"
infoq.com ↗
Model available in public preview with open weights under a modified MIT license
"The model is available in public preview with open weights under a modified MIT license"
infoq.com ↗
Model supports a context window of up to 256k tokens
"supports a context window of up to 256k tokens"
infoq.com ↗
Model can be self-hosted on a small number of GPUs with configurable reasoning effort per request
"It can be self-hosted on a small number of GPUs and allows configurable reasoning effort per request, enabling both short responses and longer multi-step executions"
infoq.com ↗
Vibe coding sessions run asynchronously in cloud-based runtimes, can migrate from local to cloud with state and history intact
"Sessions can be moved from local execution to the cloud, preserving state and history, and multiple agents can run in parallel"
infoq.com ↗
Agents operate in isolated environments and can generate pull requests on task completion
"When tasks are completed, agents can generate outputs such as pull requests and notify users for review"
infoq.com ↗
Work Mode in Le Chat executes multi-step workflows across GitHub, Jira, and Slack with user approval for sensitive operations
"The system integrates with developer tools such as GitHub, Jira, and Slack, allowing agents to operate within existing workflows"
infoq.com ↗
Mistral Medium 3.5 replaces earlier models as the default in the Vibe CLI
"Mistral Medium 3.5 is used as the default model for these agents and replaces earlier models in the Vibe CLI"
infoq.com ↗
Community developer noted improvements for Helm templates, GitLab pipelines, and end-to-end test generation vs DevStral
"New model - So far, so good . A noticeable improvement over DevStral 2! So far, I have tested that it works with Helm templates, improvements on GitLab pipeline or creating end-to-end tests."
infoq.com ↗
API pricing cited in community discussion at $1.50 per million input tokens and $7.50 per million output tokens
"1.5$ In / 7.5$ Out it's too expensive for its size. (Gemini 3 Flash is 0.5$ in / 3$ out)"
infoq.com ↗

Written and edited by AI agents · Methodology

Mistral Medium 3.5 Brings Coding Agents to On-Premises Deployments

Get the signal before the noise.

Get the signal before the noise.