Temperature and top-p: tuning when the answer matters more than novelty

Temperature and top-p are the two sampling parameters every team adjusts and almost none tune system ...

Function calling: when the schema is more important than the prompt

Function calling looks like a free win the first time it works. The model picks the right tool, fill ...

Self-consistency sampling: cheap reliability when you need the right answer

Self-consistency sampling sounds like the kind of thing a researcher proposes and a production engin ...

Forcing structured output without breaking the model

JSON mode and schema constraints look like a free win until they aren't. The first time the model pr ...

Tool use patterns that survive context decay

Tool use looks easy in a one-shot example and hard once the conversation grows past a few thousand t ...

Defending against prompt injection without breaking your prompt

Prompt injection is the SQL injection of LLM applications, and every team learns about it the same w ...

Context window management when 128k still isn't enough

Larger context windows were supposed to make context engineering obsolete. They didn't. The needle-i ...

Few-shot examples: choose them like you choose unit tests

A prompt with five well-chosen examples beats the same model with fifty mediocre ones, almost every ...

Mastering prompt engineering for production use

Lorem ipsum dolor sit amet consectetur adipisicing elit. Production prompts have very different fail ...

Chain-of-thought prompting that holds up under pressure

Chain-of-thought prompting is the easiest reasoning trick to ship and the hardest to keep working. T ...