OpenAI says treat GPT-5.5 as a new model family, not a drop-in replacement

With GPT-5.5 now available in the API, OpenAI published a prompting guide for developers and Simon Willison’s post picks out the parts that matter most. The headline recommendation is blunt: don’t treat GPT-5.5 as a drop-in replacement for earlier models. OpenAI’s guide opens with a direct warning that the post highlights:

“To get the most out of GPT-5.5, treat it as a new model family to tune for, not a drop-in replacement for gpt-5.2 or gpt-5.4. Begin migration with a fresh baseline instead of carrying over every instruction from an older prompt stack. Start with the smallest prompt that preserves the product contract, then tune reasoning effort, verbosity, tool descriptions, and output format against representative examples.”

This is an uncommon stance for a model vendor. Most upgrade guides suggest that prompts will work “better or as well” on newer models and that migration is low-friction. Here OpenAI is explicitly recommending that developers treat accumulated prompt engineering as potentially counterproductive.

Why a fresh baseline

The reasoning behind starting from scratch, according to the guide as summarized in Willison’s post, is that GPT-5.5 behaves differently enough from earlier versions that prompts optimized for gpt-5.2 or gpt-5.4 may not translate cleanly. Dimensions to re-tune include reasoning effort, verbosity, tool descriptions, and output format — all things that tend to accumulate as implicit assumptions in mature prompt stacks.

Willison notes that OpenAI recommends beginning with “the smallest prompt that preserves the product contract.” The implication is that extra instructions designed to correct or guide earlier model behavior may be unnecessary or actively unhelpful with GPT-5.5, and that carrying them forward adds noise rather than signal.

This framing has practical consequences for teams that have invested in prompt engineering. Treating those prompts as a starting point means inheriting behavior constraints that the new model may not need, while treating them as a baseline to replace means accepting migration cost upfront rather than debugging unexplained regressions later.

Handling long-running tasks

The guide includes a specific recommendation for applications where the model spends substantial time working before returning a visible response. Willison highlights this as a “neat trick”:

“Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.”

The problem this solves is latency perception in agentic applications. When a model is executing a multi-step task — calling tools, checking results, iterating — there can be a long gap between user input and visible output. Without an acknowledgment step, the user cannot distinguish between a model that is working and one that has stalled or crashed. Willison notes he has already observed this pattern in the Codex app, and that “it does make longer running tasks feel less like the model has crashed.”

The recommendation is low-cost to implement and has a direct impact on perceived reliability. It’s the kind of operational detail that tends to get discovered through production usage rather than pre-release testing, which makes it notable that OpenAI included it in the official guide.

Automated migration via Codex

For developers who want to migrate existing codebases to GPT-5.5 systematically, OpenAI has embedded guidance in a Codex skill. According to Willison’s post, running the following in Codex triggers an automated migration:

$openai-docs migrate this project to gpt-5.5

The upgrade guide the Codex agent follows is publicly available on GitHub and includes instructions for light prompt rewriting to better fit GPT-5.5’s behavior. The existence of a published, version-pinned migration guide that a coding agent can follow is worth noting: it suggests OpenAI expects migration to be a real engineering task, not a string substitution.

A separate guide — Using GPT-5.5 — is also available, and the main prompting guide for API usage is at developers.openai.com.

What this means in practice

Willison describes the fresh-baseline recommendation as “interesting,” which understates its significance for engineering teams with production deployments on earlier GPT versions. The conventional migration path is test-then-port: run existing evals on the new model, patch where things break, ship. OpenAI’s guidance here suggests a different approach: audit what the prompt is actually doing, strip it to the minimum, and rebuild for the new model’s behavior profile.

The practical effect is that GPT-5.5 migration is more work upfront but should produce cleaner, better-calibrated systems on the other side. Whether most teams will do that work, or will instead port prompts and adjust reactively, is an open question.