System prompts that survive long sessions

Every team writes a careful system prompt and forgets it. The model follows it for the first few turns and then starts ignoring it — not because the prompt is bad, but because conversation history outweighs system instructions in the model’s attention. By turn fifteen, you’re a different system than the one your tests covered.

What drift actually looks like

The format slips first: a model instructed to always respond in JSON starts adding apologetic preambles. Tone slips next: a “professional, concise” persona becomes chatty. Refusal policies erode: the model’s stance on edge cases gets softer the longer the user pushes. None of this is dramatic in any single turn — it’s the cumulative drift that breaks production behavior.

Anchoring patterns that work

Re-inject critical constraints in the system prompt or as a leading user message every N turns. Keep the system prompt short — long prompts get summarized in the model’s internal representation, and summarized constraints lose their teeth. Test with conversations that are 20-plus turns long, not 3-turn happy paths. The drift you don’t measure is the drift that ships.

The system prompt is not where you put everything you want the model to remember. It’s where you put what must survive the conversation.

Related Posts

Mastering prompt engineering for production use

Lorem ipsum dolor sit amet consectetur adipisicing elit. Production prompts have very different fail ...

Chain-of-thought prompting that holds up under pressure

Chain-of-thought prompting is the easiest reasoning trick to ship and the hardest to keep working. T ...

Few-shot examples: choose them like you choose unit tests

A prompt with five well-chosen examples beats the same model with fifty mediocre ones, almost every ...

Temperature and top-p: tuning when the answer matters more than novelty

Temperature and top-p are the two sampling parameters every team adjusts and almost none tune system ...