Claude Code 2.1.136: When an AI Agent's Safety Gate Switches from Trust to Verify

John Doe
Security , Agent
09 May, 2026

You let Claude Code run a long task in auto mode. When you come back, it has written your AWS credentials into a log file. Or worse: it clicked “allow” on a prompt you never saw, then pushed a security test script into production.

This is not hypothetical. Anthropic’s own data shows Claude Code users manually approve 93% of permission prompts. People have developed click fatigue. When a tool asks you for approval dozens of times per day, you stop reading. That is why two security changes in 2.1.136 matter: security-test labels no longer auto-approve or auto-deny, requiring explicit human review; and the new hard_deny lets classifier rules block unconditionally, overriding user intent.

This is not a routine bug fix. It is a signal that AI agent security models are shifting from trust-biased to verify-first.

Why a 93% Approval Rate Is a Problem

Anthropic disclosed this number when introducing auto mode: users manually approve 93% of permission requests. It sounds like trust. The problem is that when approval rates exceed 90%, each popup costs attention without adding real safety. By the third “allow” click, users are no longer reading.

To solve approval fatigue, Claude Code offers two extreme options:

Enable sandboxing — safe but high-maintenance; every new capability needs configuration
Use —dangerously-skip-permissions — zero maintenance but zero protection

Auto mode is Anthropic’s third path: a model-based classifier reviews each action in the background, blocking anything that escalates beyond your request, targets unrecognized infrastructure, or appears driven by hostile content. The classifier runs in two stages: a fast single-token filter (yes/no), followed by chain-of-thought reasoning only if the first stage flags the action.

But classifiers are not human. They make mistakes. Anthropic estimates the false rejection rate at roughly 17% — meaning 17% of blocked actions are actually safe. This is an explicit trade-off: you are exchanging efficiency for a tighter safety boundary.

The Two Key Security Changes in 2.1.136

Security-Test Labels No Longer Auto-Process

In previous versions, operations tagged with security-test were automatically approved or denied without human review. That automation is now completely removed: all security-test related operations require explicit human approval.

This is a major signal. Security testing represents high-risk operations: penetration testing, privilege escalation, or executing commands that could damage production. Letting an automated tag decide whether these run is delegating high-stakes decisions to a mechanism that cannot be held accountable. Removing automation is the right call, but it also means you can no longer run security scans unattended in auto mode.

hard_deny: Unconditional Blocking

The new settings.autoMode.hard_deny is a strong mechanism. It lets administrators configure a class of classifier rules that unconditionally block associated operations, ignoring user intent and allow exceptions.

Auto mode’s permission model has three layers:

permissions.deny — runs before the classifier, cannot be overridden
autoMode.soft_deny — classifier blocks, but can be overridden by explicit user intent
autoMode.allow — exception list, can override soft_deny

hard_deny is a new layer between soft_deny and allow: it is blocked by the classifier and accepts no overrides. Even if the user directly requests the operation, if it triggers a hard_deny rule, it is rejected.

This solves a real problem. In some organizational environments, there are operations that must never execute — regardless of how compelling the user’s reason. Examples include rm -rf /, production deployments, or sensitive database writes. Previously, these needed permissions.deny configuration, but permissions.deny runs before the classifier and does not see full session context. hard_deny lets the classifier make unconditional rejections based on complete context.

What Else Was Fixed

2.1.136 ships 52 CLI changes and 2 system prompt changes. Beyond the security model adjustments, several fixes stand out:

MCP servers no longer silently disappear after /clear in the VS Code extension, JetBrains plugin, and Agent SDK
OAuth refresh tokens are no longer lost when multiple MCP servers refresh concurrently
Fixed a login loop caused by concurrent credential writes overwriting a freshly-rotated OAuth token
Plan mode now correctly blocks file writes even when a matching Edit allow rule exists

The last two are particularly notable. Both are race conditions — contention when multiple processes or sessions manipulate credentials simultaneously. This indicates Claude Code’s user base has grown large enough that single-machine concurrency breaks things, and Anthropic is evolving the tool from a single-user CLI toward a multi-session, multi-plugin production platform.

Six Permission Modes: Which One to Use

Claude Code currently supports six permission modes:

Auto mode is the only option that balances safety with efficiency, but it is also the only option that relies on model judgment. Other modes are deterministic: default always asks; bypassPermissions always executes. Auto mode makes a probabilistic decision you cannot fully inspect.

Anthropic’s own advice is that auto mode is for people who previously used —dangerously-skip-permissions, not for those who carefully review every prompt. If you stay alert for high-risk infrastructure operations, manual review remains the safer choice.

When to Use It

Tasks requiring long autonomous runs

Automated code migration, large-scale refactoring, or test suite execution involving hundreds of file operations and commands. Manual approval would completely break flow. Auto mode lets these run uninterrupted.

Team environments with multiple users

hard_deny and managed settings let administrators define absolutely forbidden operations at the organizational level. Every developer works within this framework without worrying that someone’s Claude Code instance might accidentally delete the production database.

Enterprise scenarios requiring audit trails

Every auto mode denial is recorded under /permissions in the Recently denied tab. PreToolUse hooks and PermissionDenied hooks allow custom audit logic, such as writing all denials to a SIEM or alerting the security team.

The Gotchas

The classifier is not perfect

A 17% false rejection rate means auto mode will block some safe operations. If your workflow heavily depends on a pattern that gets falsely rejected, you need to add exceptions in autoMode.environment or autoMode.allow. But too many exceptions weaken the classifier’s protective value.

Do not use bypassPermissions in production

The —dangerously-skip-permissions name is not hyperbole. It completely disables all permission prompts and safety checks, including writes to protected paths like .git, .claude, and .vscode. Only use it in isolated containers or VMs. Administrators can disable this mode entirely via managed settings.

Sandbox and permissions are independent layers

Many people assume enabling sandboxing eliminates the need for permission rules. In reality, sandboxing only controls Bash commands; it does not control Read/Edit tools. If you only configure sandbox.denyRead to protect SSH keys, Claude’s Read tool can still read them. You need to configure both layers for the same paths.

Remember to include “$defaults”

If you set custom rules in autoMode.allow, soft_deny, or environment but forget to include “$defaults”, all built-in rules are replaced. This means force push, data exfiltration, curl | bash, and other default block rules stop working. Only omit “$defaults” when you genuinely intend to take full ownership of the rule list.

Configuration Example

A typical team-level security setup:

{
  "defaultMode": "auto",
  "autoMode": {
    "environment": [
      "$defaults",
      "github.com/my-org",
      "s3://my-org-bucket"
    ],
    "hard_deny": [
      "$defaults",
      "deploy to production",
      "modify production database",
      "access AWS credentials"
    ],
    "allow": [
      "$defaults",
      "run tests in CI environment"
    ]
  },
  "permissions": {
    "deny": [
      "Bash(rm -rf /)",
      "Bash(rm -rf ~)",
      "Read(~/.ssh/*)",
      "Read(~/.aws/*)"
    ]
  }
}

# View currently effective classifier config
claude auto-mode config

# View built-in default rules
claude auto-mode defaults

# Have AI review your custom rules
claude auto-mode critique

When Not to Use Auto Mode

You work with financial data, medical records, or PII — the 17% false rejection rate is not the only concern; the larger risk is that it might mistakenly allow an operation that should not execute
Your team lacks security review processes — auto mode is not an unsupervised free pass; it is a permission model that requires configuration and monitoring
You only use Claude Code occasionally — for short tasks, the approval overhead of default or acceptEdits mode is entirely acceptable

The Bottom Line

The security changes in Claude Code 2.1.136 send a clear signal: AI agents are no longer treated as “trust but verify” tools, but as systems that must “verify before trust.” The removal of automatic approval for security tests is because high-risk operations should never be decided by any automated mechanism. The introduction of hard_deny is because some boundaries should not be overrideable by user intent.

These changes matter beyond Claude Code itself. They are a warning light for the entire AI agent ecosystem: as agents are granted more autonomy, safety models must tighten in parallel. Otherwise we do not get more productive developers — we get more security incidents.

Is your agent running inside a sandbox with safety boundaries, or acting freely in a terminal with full permissions? The answer determines whether you can use it in production.

GitHub: https://github.com/anthropics/claude-code Docs: https://code.claude.com/docs/en/auto-mode-config