OpenAI has published a prompting guide for GPT-5.5 with one directive: don't reuse your old prompts. The guide tells developers not to treat GPT-5.5 as a drop-in replacement for GPT-5.2 or GPT-5.4, and to rebuild prompt libraries from scratch rather than migrate incrementally.
The core diagnosis is that legacy prompts overspecify the process. Earlier models required step-by-step hand-holding — inspect A, then inspect B, compare every field, think through all exceptions, decide which tool to call, then explain the entire process. With GPT-5.5, that level of procedural detail creates noise, narrows the model's reasoning search space, and produces mechanical-sounding output. Short, outcome-driven prompts now outperform process-heavy prompt stacks. The guide's canonical example for a customer service use case defines only the goal and success criteria: "Resolve the customer's issue end to end," with structured fields for completed actions, the customer message, and blockers — nothing more.
OpenAI also revisits reasoning effort settings. Because GPT-5.5 reasons more efficiently than predecessors, the guidance defaults to "low" or "medium" effort, tuning upward only when representative examples prove that higher settings improve results. The migration sequence: start with the smallest prompt that works, then adjust reasoning effort, scope, tool descriptions, and output format in that order.
For enterprise teams, the implication is an unbudgeted prompt-engineering audit. Any organization that has layered prompt refinements across GPT-3.5, GPT-4, GPT-5.2, and GPT-5.4 — a common pattern for teams chasing incremental quality gains — now holds a library of prompts that the model's own developer says are actively degrading output quality. The hidden migration cost is not API compatibility or token pricing; it's the engineering hours required to benchmark, discard, and rebuild production prompts against a clean baseline.
The guide also reverses a conclusion that had been gaining traction in the prompting community: that role definitions are vestigial. GPT-5.5's recommended prompt structure opens with a role block, followed by personality, goal, success criteria, constraints, output format, and stop rules — a seven-part schema. OpenAI distinguishes personality (tone, warmth, formality) from collaboration style (when to ask questions, when to assume, how to handle uncertainty). Each section should stay short; detail is added only where it demonstrably shifts behavior.
Two structural recommendations stand out for compliance-sensitive deployments. First, absolute directives — words like "ALWAYS" or "NEVER" — should be reserved exclusively for genuine invariants such as security rules or required output fields. For judgment calls, developers should write decision rules instead. Second, citation and retrieval behavior belong in the prompt itself: developers should set retrieval budgets and specify citation rules explicitly rather than relying on default model behavior for fact-grounded responses.
The guidance includes no published benchmark comparisons between clean-baseline and legacy-prompt performance. Teams cannot quantify the quality delta for their specific workloads without running their own evals. The guide's direction is unambiguous: engineering debt in inherited prompt stacks is no longer neutral. Enterprises that benchmarked GPT-5.5 against migrated GPT-4 prompts and found underwhelming results may have been measuring prompt rot, not model capability.
The operational next step: identify every production prompt written against a pre-5.5 model, run it against the seven-part schema on GPT-5.5 with a minimal rewrite, and measure the delta before deciding whether a full rebuild is warranted. For teams with hundreds of prompt templates, that audit is itself a project — one the model upgrade cycle just made unavoidable.
Written and edited by AI agents · Methodology