Karpathy's Fix for Claude Code: Four Principles That Kill Bloated AI Code

Andrej Karpathy put his finger on it: LLMs make wrong assumptions and run with them. They overcomplicate code, bloat abstractions, and touch things they shouldn’t. The worst part? They do it politely, confidently, and at scale.

A single CLAUDE.md file — now available as a Claude Code plugin — addresses these exact failure modes. Four principles, derived directly from Karpathy’s observations, that change how Claude Code behaves from the ground up.

The Four Principles

Each principle maps to a specific failure pattern Karpathy identified:

Think Before Coding — LLMs pick an interpretation silently and go. This principle forces explicit reasoning: state assumptions, present alternatives, push back when a simpler approach exists. If Claude is confused, it says so instead of guessing.

“They don’t manage their confusion, don’t seek clarifications, don’t surface inconsistencies.”

Simplicity First — The default behavior is to build 1000 lines when 100 would do. This principle kills the tendency: no features beyond what was asked, no abstractions for single-use code, no “flexibility” that nobody requested. The test: would a senior engineer call this overcomplicated?

“They really like to overcomplicate code and APIs, bloat abstractions.”

Surgical Changes — LLMs change or remove code they don’t fully understand as side effects. This principle restricts edits to only what the task requires. No “improving” adjacent code. No refactoring things that aren’t broken. Every changed line must trace directly to the user’s request.

“They still sometimes change/remove comments and code they don’t sufficiently understand as side effects.”

Goal-Driven Execution — Karpathy noted LLMs are exceptionally good at looping until they meet specific goals. This principle transforms imperative instructions into verifiable success criteria. Instead of “add validation,” it becomes “write tests for invalid inputs, then make them pass.”

“Don’t tell it what to do, give it success criteria and watch it go.”

When It Makes the Difference

Code reviews — The surgical changes principle produces PRs that contain only the requested changes. No drive-by refactoring, no mystery diffs.

New feature work — Goal-driven execution turns vague requests into testable outcomes. Claude self-verifies and iterates without hand-holding.

Debugging — Think Before Coding catches wrong assumptions before code gets written. Claude surfaces what it doesn’t understand instead of guessing wrong.

Legacy code — Surgical changes prevent the “I’ll just clean this up” trap. Claude touches only the bug, not the surrounding 500 lines of questionable style.

The Tradeoff You Should Know

These guidelines bias toward caution over speed. Trivial fixes — typo corrections, obvious one-liners — don’t need the full cycle. The principles are designed to reduce costly mistakes on non-trivial work, not to slow down simple tasks.

Applied indiscriminately, you get a cautious assistant that asks too many questions. The skill works best when your brain knows the difference between “add a field” and “redesign the auth flow.”

Installation

Option A: Claude Code Plugin (recommended)

From within Claude Code:

/plugin marketplace add forrestchang/andrej-karpathy-skills

/plugin install andrej-karpathy-skills@karpathy-skills

This installs the guidelines as a plugin, available across all projects.

Option B: CLAUDE.md (per-project)

For a new project:

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

For an existing project, append to your current CLAUDE.md.

What Changes in Claude’s Behavior

Before the guidelines, ask Claude to “fix the validation bug” and it might rewrite the entire validation module, add a configuration system, and “fix” your error handling style.

After the guidelines, it reproduces the bug in a test, applies the minimal fix, and lists exactly three changed lines.

You can observe the difference in the diffs. Clean, minimal, traceable. Every change maps to a requirement.

End-to-End: From Imperative to Declarative

A typical interaction without the guidelines:

You: Add rate limiting to the API
Claude: [Adds rate-limit middleware, a Redis-backed store, config
        system with env vars, and rewrites the error handler]

Same request with the guidelines active:

You: Add rate limiting to the API
Claude: I'll implement this. Goal: requests over 100/min return
        429. Let me add the middleware and a config flag.
        [Minimal diff, three files changed, focused]

The difference is not subtle. The first version requires a rollback. The second goes straight to review.

Karpathy’s critique wasn’t new — every heavy user of AI coding tools has felt these pain points. What’s new is having a systematic fix that fits in a single file.

If your Claude Code sessions end with “why did it also change that,” you already know which problem this solves. The fix takes thirty seconds to install.

GitHub: https://github.com/forrestchang/andrej-karpathy-skills Karpathy’s original post: https://x.com/karpathy/status/2015883857489522876