Few-shot examples: choose them like you choose unit tests

A prompt with five well-chosen examples beats the same model with fifty mediocre ones, almost every time. The mistake most teams make is treating examples like decoration — a few obvious cases pasted at the top of the prompt. Examples are the closest thing you have to test cases, and they should be curated with the same care.

What a good example actually does

It pins down a decision the model would otherwise hedge on. If your task involves rare-but-important categories, every example you skip is a category the model will silently merge into something more common. If the task is style-sensitive, every example sets the rhythm of the answer. The model is doing pattern-matching on your examples; if your examples don’t capture the patterns you care about, you’re picking your own bias against yourself.

Choosing them like unit tests

Cover edge cases first, not happy paths. The cases your examples don’t cover are the ones the model will guess on. Track example performance on a holdout set — when an example stops paying its weight in tokens, replace it. Treat the example pool as a living dataset, not a static prompt fragment.

The teams that reliably ship few-shot prompts have an example-curation pipeline. The teams that struggle treat examples as folklore.

Related Posts

Mastering prompt engineering for production use

Lorem ipsum dolor sit amet consectetur adipisicing elit. Production prompts have very different fail ...

Chain-of-thought prompting that holds up under pressure

Chain-of-thought prompting is the easiest reasoning trick to ship and the hardest to keep working. T ...

System prompts that survive long sessions

Every team writes a careful system prompt and forgets it. The model follows it for the first few tur ...

Temperature and top-p: tuning when the answer matters more than novelty

Temperature and top-p are the two sampling parameters every team adjusts and almost none tune system ...