Chapter 03 of 14 · Part 1: Foundations

Chapter 3: The Brain and the Hands — Why Claude Needs a Container to Do Real Work

By the end of this chapter, you will understand the architectural design behind Claude Managed Agents — why the brain and the container are deliberately separated, and why that separation makes your agents faster, more reliable, and more secure.


The Big Idea

Claude is extraordinarily capable at reasoning, writing, and planning. But by itself, Claude is pure language. It can describe how to run a Python script. It can't actually run one.

To do real work — generate a file, execute a command, search the web, edit code — Claude needs an execution environment. It needs hands.

The insight behind Claude Managed Agents' architecture is that the brain (Claude reasoning about what to do) and the hands (the container that executes actions) should be separate systems. Not just logically separate, but structurally independent, so that a failure in one doesn't destroy the other.

This design — which Anthropic's engineering team calls "decoupling the brain from the hands" — is what makes Managed Agents resilient, fast, and safe in ways that simpler setups can't match. Understanding it will help you build more reliable agents and debug them faster when things go wrong.

Brain and Hands — Decoupled Reasoning from Execution Claude's LLM "Brain" sends tool calls to the Hands/Environment container. Tools (bash, read, write) execute and return results. This decouples reasoning from execution. Brain / LLM reasons & plans tool_use: bash cmd: "ls /workspace" Decoupled Brain from Hands The harness calls the container as a tool: execute(name, input) → string tool_use: bash tool_use: read tool_result → string Managed Container $ bash executing… 📄 read reading… ✏️ write writing… Hands / Environment executes tools in isolation Key insight: If the container fails, the harness catches it as a tool error — the session survives
Architectural diagram showing two distinct boxes. Left box labeled "Brain" with Claude's logo — "Reasons. Plans. Decides what to call." Right box labeled "Hands" (the container/sandbox) — "Executes. Files. Network. Code." Between them: a narrow interface labeled "tool call: execute(name, input) → string." Below: an arrow from the left box down to a separate element labeled "Session Log (durable)" — "Neither the brain nor the hands owns this. It persists outside both."

The Analogy

Think of a surgeon and a surgical team.

The surgeon (Claude) makes all the clinical decisions: what to cut, where to suture, how to manage complications. But the surgeon doesn't sterilize the instruments, set up the operating room, or log the procedure in the hospital system. That work belongs to the scrub technician and the circulating nurse.

More importantly: if the scrub tech drops an instrument, the surgery doesn't stop. Someone hands the surgeon a replacement. The surgeon's knowledge of the procedure is unaffected.

The Managed Agents architecture works the same way. Claude is the surgeon — it reasons about what to do next. The container is the surgical team — it executes the actions. And the session log is the hospital record — it exists independently of both, so a failure in the operating room doesn't erase the procedure history.

Before this architecture, the surgeon and scrub tech shared the same room, and if anything went wrong, the entire procedure was lost. That's the fragile version. Managed Agents is the resilient version.

DiagramSide-by-side comparison. Left panel labeled "Fragile (coupled)": single large box containing both brain and sandbox together. Caption: "Container failure = session lost." Arrow crossing out the whole box. Right panel labeled "Resilient (decoupled)": two separate boxes (brain and container) connected by a thin interface line, both feeding into a separate "Session Log" at the bottom. Caption: "Container fails → brain catches error as tool result → session continues."

How It Actually Works

The Problem with Coupling

According to Anthropic's engineering blog post on Managed Agents (published February 4, 2026, by Lance Martin, Gabe Cemaj, and Michael Cohen), the initial design put all agent components into a single container. Session, harness, and sandbox were all running together in one place.

The consequence: "When the container failed, the session was lost." Every session paid the full container setup cost up front — even sessions that would never touch the sandbox. Startup was slow and failure was total.

The Fix: Decoupling the Brain from the Hands

The architectural fix was structural. The harness (the loop that calls Claude and routes its tool requests) was moved outside the container. Now the container is just another tool:

"The harness leaves the container. The harness no longer lives inside the container. It calls the container the way it calls any other tool: execute(name, input) → string."

The container became what the engineering team calls "cattle" — interchangeable and disposable. The insight is captured in the quote: "If the container died, the harness caught the failure as a tool-call error and passed it back to Claude." (Engineering blog)

Claude can handle a tool-call error as information. It can decide what to do next: retry, try a different approach, or report the failure. The session doesn't end just because a container had a problem.

The Three Virtualized Components

The current architecture virtualizes three components:

  • Session — the append-only log of everything that happened
  • Harness — the loop that calls Claude and routes Claude's tool calls to the relevant infrastructure
  • Sandbox — an execution environment where Claude can run code and edit files

Each component has a well-defined interface. The implementation of each can be swapped without disturbing the others. As the blog states: "We're opinionated about the shape of these interfaces, not about what runs behind them."

The Session as a Durable Context Object

The session log is the heart of the architecture. It lives outside both the harness and the sandbox — stored durably regardless of what either component does.

This creates what Anthropic calls "the session as context object":

"In Managed Agents, the session provides this same benefit, serving as a context object that lives outside Claude's context window. But rather than be stored within the sandbox or REPL, context is durably stored in the session log."

The harness writes to the session with emitEvent(id, event). The session is append-only: events are added but never deleted. When a harness needs to recover, it calls wake(sessionId) and picks up exactly where it left off.

The interface getEvents() allows the harness to interrogate context by selecting positional slices of the event stream — picking up from wherever it last stopped, rewinding a few events to see context before a specific action.

The Performance Payoff

Decoupling produced measurable improvements. From the engineering blog:

"Using this architecture, our p50 TTFT dropped roughly 60% and p95 dropped over 90%."

TTFT is "time to first token" — how long before you see the agent's first response. Previously, every session paid the full container setup cost up front, even when the container wasn't immediately needed. Now, the harness only provisions the sandbox when Claude actually needs to execute something — via a tool call.

Sessions that do lightweight work (a question-and-answer, a memory lookup, a web search) spin up near-instantly. Sessions that need a full container only pay that cost when they make the tool call that requires it.

Security: Credentials Never Touch the Sandbox

The decoupled architecture also solves a security problem. In the coupled design, Claude's generated code ran in the same container as credentials and environment variables. A prompt injection attack — where malicious content in a file or web page tricks Claude into executing unauthorized commands — only needed to convince Claude to read its own environment.

The fix uses two patterns from the engineering blog:

  1. Auth bundled with a resource — For Git, the access token is used to clone the repo during sandbox initialization and wired into the local git remote. git push and pull work from inside the sandbox without the agent ever handling the token itself.

  2. Auth held in a vault outside the sandbox — For custom tools, OAuth tokens are stored in a secure vault. Claude calls MCP tools via a dedicated proxy. The proxy fetches the corresponding credentials from the vault and makes the call to the external service. "The harness is never made aware of any credentials."

This means: even if Claude is somehow tricked into trying to read credential values, they're not there to be read.

"Many Brains, Many Hands"

The final architectural insight is what makes multi-agent work possible. Because every "hand" is just a tool with the interface execute(name, input) → string, the harness doesn't care what the hand actually is. It could be a container, a phone, or — as the engineers wrote — "a Pokémon emulator."

And because no hand is coupled to any brain, brains can pass hands to one another. One Claude agent can hand off work to another Claude agent, and they can share the same execution environment while maintaining separate context windows.

You don't need to understand this in full detail yet — Chapter 15 covers multi-agent patterns. But knowing that this flexibility comes from the decoupled architecture helps you appreciate why the design decisions were made.

Diagram"Many brains, many hands" concept diagram. Three "brain" boxes (each with a Claude icon) on top. Three "hand" boxes (container, API, custom tool) on the bottom. Multiple crossing arrows showing that any brain can call any hand. Unified interface label: "execute(name, input) → string." Small callout: "Brains can also pass hands to one another."

Try it yourself

Try It Yourself

This exercise builds intuition for the architecture by examining how events map onto the brain/hands separation.

  1. Read the engineering blog post. Open anthropic.com/engineering/managed-agents and skim the "Pets vs. Cattle" section. Notice the specific language about the container becoming "cattle."

  2. Understand what the session.status_idle event means architecturally. When a session emits this event with stop_reason: end_turn, the harness has paused. The brain is waiting. The session log is fully persisted. Nothing is lost.

  3. Look at the session statuses in the docs. Go to platform.claude.com/docs/en/managed-agents/sessions. Notice the four statuses: idle, running, rescheduling, terminated. The rescheduling status is the architectural recovery mechanism in action — a transient error occurred, and the harness is automatically retrying.

  4. Map the architecture to your planned use case. For the task you've been planning: what would the "brain" need to reason about? What would the "hands" need to execute? What events would flow between them? Draw it out.

  5. Think about your security requirements. Does your planned agent need to access external services (APIs, databases, third-party tools)? If so, those credentials should live in a vault — not in your system prompt or your container's environment variables. Chapter 13 covers vaults in detail.

DiagramAnnotated flow diagram of a single agent session. Claude (brain) in the center. On the left: incoming user message event. On the right: outgoing tool calls (bash, web_fetch, read). Below: the session log collecting all events. Dashed box showing the sandbox as a separate zone, only connected to Claude via the tool interface.

Common pitfalls

Common Pitfalls

  • Expecting the container to persist between sessions. Each session gets its own isolated container instance. Files written during Session A are not available in Session B unless you explicitly save them to the session outputs and re-mount them. Use the Files API or memory stores for cross-session persistence — those are the persistent layers.

  • Putting credentials in the system prompt. This is the security anti-pattern the architecture is designed to prevent. If you put an API key in a system prompt, it lives in the session log and in Claude's context window — both accessible to prompt injection. Use vaults instead.

  • Treating the rescheduling status as an error. When a session enters rescheduling, it means a transient error occurred and the system is automatically recovering. You don't need to intervene. Wait for it to return to running or idle. Only terminated requires your attention.

  • Trying to run long tasks without planning for context limits. Long-horizon tasks can exceed Claude's context window. The harness supports prompt caching and compaction natively, but you should still design your tasks so the agent can make meaningful progress in bounded steps. Sessions that run for hours will use more tokens and cost more — design accordingly.


Toolkit

Toolkit

  • Diagram Template: Brain-Hands Architecture — A fillable diagram template to map any agent workflow onto the brain/hands/session-log structure. Includes fields for: what Claude reasons about, what tools execute, what persists in the session log, where credentials live.

  • Security Checklist: Does Your Architecture Expose Credentials? — Five-question checklist covering credential placement, sandbox isolation, networking mode, and prompt injection surface area.


Chapter Recap

  • Claude Managed Agents decouples the brain (Claude reasoning) from the hands (the container that executes) to make agents resilient, fast, and secure. A container failure becomes a tool-call error, not a lost session.
  • The session log is an append-only, durable record that lives outside both the harness and the sandbox. It's what makes recovery possible — a new harness can boot from any point in the history.
  • Decoupling produced dramatic performance improvements: p50 time-to-first-token dropped roughly 60%, p95 dropped over 90%, because containers are only provisioned when actually needed.