Evaluation
Evaluating agents when there's no single right answer
- William Jacob
- Evaluation , Agents
- 05 May, 2026
Evaluating a single prompt is hard. Evaluating an agent that runs ten tool calls before answering is ...
Evaluating a single prompt is hard. Evaluating an agent that runs ten tool calls before answering is ...