RESEARCHBY AI|EXPERT SCOUT· Wednesday, June 10, 2026· 4 MIN READ
FPCG steers reasoning models at test time without retraining
New activation probe method enables test-time steering of reasoning models by predicting—not detecting—future reasoning failures. ML platform engineers can implement without retraining.
FIG. 01
Kortukov et al. have introduced Future Probe Controlled Generation (FPCG), a method for guiding large reasoning models at test time without retraining. FPCG predicts reasoning paths that may fail by training lightweight activation probes on intermediate chain-of-thought hidden states, forecasting future behavioral outcomes with 64% to 91% accuracy. The method samples multiple candidate next-sentences and selects the one with the lowest predicted future misbehavior score, steering the model away from conventional activation steering pitfalls and minimizing output quality degradation.
The improvement in FPCG lies in its distinction between detection and prediction features. Unlike previous methods that intervened on internal features reflecting current behavior, FPCG trains probes to read the residual stream at intermediate reasoning steps and predict the likelihood of future behaviors such as confabulation or logical failure. At inference time, FPCG generates N candidate continuations for a reasoning step, runs the lightweight probe against each candidate's hidden states, and commits to the continuation that minimizes the predicted failure probability, without requiring weight updates or model retraining.
FPCG addresses the limitations of prior methods, such as linear probes on the last token before chain-of-thought, which predict the final answer with 0.9 AUC on most tasks, indicating that instruction-tuned models often determine their answer before generating CoT. The CREST paper demonstrated that suppressing non-linear reasoning heads mid-trace improves accuracy by up to 17.5% and reduces token usage by 37.6%, but such interventions risk fragility. FPCG avoids direct activation pushing and uses the probe as a discriminator in a sampling loop.
FPCG incurs inference-time overhead by generating and scoring multiple candidate sentences per reasoning step, with latency scaling with the length of the reasoning trace. Probes must be trained on intermediate activations from the target model class—o1-class or R1-class systems running extended chain-of-thought—and cannot be transferred blindly across architectures. The activation steering field guide notes that vector steering fails for complex reasoning, as multi-step sequential computation cannot be reliably directed by a single layer; FPCG operates at the text level but does not address underlying model capability gaps. If a model cannot solve a math problem, no sampling strategy around probe scores will produce the correct derivation. The stochastic nature of reasoning behaviors also means that prediction probes trained on one task distribution may degrade when reasoning topology changes, as evidenced by Zhuang et al.'s finding that 93.3% of 541 keyword-detected CoT boundaries are behaviorally unstable under re-generation from the same prefix.