Few-shot examples: choose them like you choose unit tests

A prompt with five well-chosen examples beats the same model with fifty mediocre ones, almost every time. The mistake most teams make is treating examples like decoration — a few obvious cases pasted at the top of the prompt. Examples are the closest thing you have to test cases, and they should be curated with the same care.

What a good example actually does

It pins down a decision the model would otherwise hedge on. If your task involves rare-but-important categories, every example you skip is a category the model will silently merge into something more common. If the task is style-sensitive, every example sets the rhythm of the answer. The model is doing pattern-matching on your examples; if your examples don’t capture the patterns you care about, you’re picking your own bias against yourself.

Choosing them like unit tests

Cover edge cases first, not happy paths. The cases your examples don’t cover are the ones the model will guess on. Track example performance on a holdout set — when an example stops paying its weight in tokens, replace it. Treat the example pool as a living dataset, not a static prompt fragment.

The teams that reliably ship few-shot prompts have an example-curation pipeline. The teams that struggle treat examples as folklore.

Few-shot examples: choose them like you choose unit tests

What a good example actually does

Choosing them like unit tests

Tags :

Share :

Related Posts

Mastering prompt engineering for production use

Chain-of-thought prompting that holds up under pressure

System prompts that survive long sessions

Temperature and top-p: tuning when the answer matters more than novelty