Reliability

Retry, backoff, and the ghosts in your latency graph

Retry logic for LLM calls is one of those things that feels obvious until it nearly takes down a ser ...

Rate limits that protect users, not just upstream

Rate limiting in an LLM app is solving three problems at once and most implementations only solve on ...