Tool selection: when the model should pick, and when you should

Tool-using agents look powerful in demos because the model is choosing what to do next. They look fragile in production because the model is choosing what to do next. The space of available tools grows linearly with features and quadratically with edge cases — past about a dozen tools, the model starts conflating their roles and picking based on surface similarity in the tool name.

What goes wrong as tool count grows

Beyond ten or fifteen tools, the descriptions blur together in the model’s representation. The model picks a search tool when a database lookup was correct, because both have “lookup” in their description. It picks the simpler tool when the complex one was needed, because the simpler one matched the user phrasing. None of this shows up in single-call testing — it surfaces when one of the tools quietly handles a request another tool was supposed to handle, and the answer is technically valid but operationally wrong.

Architectural answers, not prompt answers

Group tools by purpose and route the request to a sub-agent that only sees the relevant subset. Surface fewer tools to the top-level model than you actually expose internally — five visible tools with clear purposes outperforms twenty undifferentiated ones. For destructive or expensive tools, require an explicit naming match, not a model-chosen one.

The number of tools an agent should choose from is much smaller than the number of tools you’d like to give it. Past a threshold, every additional tool makes every other choice worse.

When the agent fails: recovery patterns that don't loop forever

Agent failures don't throw exceptions. They produce plausible-looking output that's wrong, or quietl ...

Evaluating agents when there's no single right answer

Evaluating a single prompt is hard. Evaluating an agent that runs ten tool calls before answering is ...

Agent guardrails without lobotomizing the agent

Adding guardrails to an agent is one of those tasks where the easy version is too restrictive and th ...

The Agent Harness: Why Your Model Isn't the Problem

LangChain jumped from outside the top 30 to number 5 on TerminalBench 2.0. They didn't change the mo ...

Planner-executor splits: when to separate them

A single model doing both planning and execution feels elegant on day one. By month three, the trace ...

Postiz Agent CLI: Hand Your AI the Keys to 28 Social Platforms

John Doe
Tools , Agent
09 May, 2026

You built an agent that reads RSS feeds, summarizes papers, and generates cover images. Then you hit ...

Tool selection: when the model should pick, and when you should

What goes wrong as tool count grows

Architectural answers, not prompt answers

Tags :

Share :

Related Posts

When the agent fails: recovery patterns that don't loop forever

Evaluating agents when there's no single right answer

Agent guardrails without lobotomizing the agent

The Agent Harness: Why Your Model Isn't the Problem

Planner-executor splits: when to separate them

Postiz Agent CLI: Hand Your AI the Keys to 28 Social Platforms

File-Based Agents Don't Need a Build Step

Designing an agent harness that doesn't fight the model

Memory strategies for long-running agents

How autonomous is too autonomous

Agent memory: episodic, semantic, and what to keep