# Template 06 — Top 10 Beginner Mistakes Checklist

Most first agents fail not because the underlying technology is hard, but because of a small set of predictable mistakes that experienced builders have learned to avoid. This checklist documents the 10 most common ones.

Read through it before you build. Return to it when your agent behaves unexpectedly — one of these is usually the root cause.

---

## How to Use This Template

- Check each item as you build your first agent.
- If you find yourself guilty of one, use the "How to fix it" guidance immediately — don't wait.
- Keep this checklist nearby during your first deployment review.

---

## The Checklist

### Mistake 1 — Vague System Prompt

**What it is:** Your system prompt says something like "You are a helpful assistant. Help with whatever the user asks." It gives Claude no persona, no goals, no constraints, and no output format.

**Why it hurts:** Claude improvises to fill the gaps. It might use tools you didn't expect, produce output in a format your code can't parse, make up data when it can't find real data, or write in a style that doesn't match your brand. The agent is unpredictable because you gave it no guidance.

**How to fix it:** Use Template 02 (System Prompt Builder). At minimum, fill in `<role>`, `<goals>`, `<constraints>`, and `<output_format>`. Test the prompt in the Claude Console with 3 different inputs before creating the agent.

- [ ] My system prompt has a specific role, 2+ goals, at least 2 hard constraints, and an explicit output format.

---

### Mistake 2 — Over-Permissioning Tools

**What it is:** You enable all 8 tools (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `web_fetch`, `web_search`) because it's the default and you haven't thought about it.

**Why it hurts:** A content-writing agent that can run `bash` can also delete files, install packages, make outbound network connections, and generally do things you never intended. If the agent gets confused (or is manipulated via prompt injection), the damage is proportional to the tools it has access to.

**How to fix it:** Use the whitelist pattern. Start with all tools disabled (`default_config: { enabled: false }`) and explicitly enable only the tools your agent needs for its specific task. Use Template 01, Part 3 to make this decision deliberately.

- [ ] I am using the whitelist pattern: `default_config: { enabled: false }` plus explicit enables only for what my agent needs.

---

### Mistake 3 — Forgetting Memory Means Each Session Starts Fresh

**What it is:** You build an agent that learns user preferences in session 1 (the user corrects the tone, adjusts the format, clarifies their context). In session 2, the agent knows none of that — it starts blank.

**Why it hurts:** The user has to repeat themselves every time. The agent makes the same mistakes the user already corrected. For any agent that should improve over time or maintain continuity, this makes the product feel broken.

**How to fix it:** Design your memory architecture before you build (Template 05). If the agent needs to carry anything across sessions — preferences, project state, past decisions — provision a memory store and attach it to every session via the `resources[]` array.

- [ ] I have decided whether my agent needs memory. If yes, I have created a memory store and attach it at session creation.

---

### Mistake 4 — Not Using Sessions for Durability

**What it is:** You treat sessions as one-shot transactions — create a session, send a message, get a response, delete the session. When you need continuity, you try to pass previous context in the user message instead.

**Why it hurts:** You lose the session's built-in event history, prompt caching benefits, and durability. Passing prior context manually in each user message bloats your input tokens, costs more, and creates a maintenance burden. Sessions were designed to persist — use them.

**How to fix it:** Create one session per user conversation thread or project. Send multiple `user.message` events to the same session over time. Retrieve event history with `client.beta.sessions.events.list()` if you need to replay or analyze past turns. Archive (don't delete) sessions you are finished with to preserve history.

- [ ] I create one session per project or conversation thread, not a new session per message.

---

### Mistake 5 — Skipping the Console Prototype Phase

**What it is:** You skip interactive testing and go straight to writing the production session-creation code. You discover problems (wrong output format, confusing behavior, missing constraints) only after deploying.

**Why it hurts:** Problems are much cheaper to find in the Console than in code. A system prompt that seems clear in your head often fails on the first real input. You'll spend hours debugging behavior that a 5-minute Console test would have caught.

**How to fix it:** Before writing a line of session code, test your system prompt interactively. The Claude Console provides session tracing, integration analytics, and troubleshooting guidance. Run at least 3 test inputs: a typical case, an edge case, and a case you expect to fail gracefully.

- [ ] I have tested my system prompt in the Claude Console with at least 3 different inputs before creating my agent via the API.

---

### Mistake 6 — Picking Opus for Simple Tasks

**What it is:** You default to `claude-opus-4-7` for every agent because it's "the most capable."

**Why it hurts:** Opus costs $5/MTok input and $25/MTok output — five times the price of Haiku. For a classification agent, a triage agent, a formatting agent, or any task with a clear schema, Haiku produces equivalent results at one-fifth the cost. Using Opus for a task that needs Haiku is pure waste.

**How to fix it:** Use the Model Selection Rubric in Template 01. Start with Sonnet 4.6 for most tasks. Measure output quality. If quality is good, try Haiku. Only escalate to Opus when Sonnet fails on your benchmark tasks.

**Quick rubric:**
- Haiku 4.5 — Classification, routing, templated output, simple Q&A
- Sonnet 4.6 — Research, analysis, multi-step reasoning, code generation, most agentic work
- Opus 4.7 — Complex reasoning, long-horizon tasks, cases where quality is the primary constraint

- [ ] I have consciously chosen a model based on task complexity, not defaulted to Opus.

---

### Mistake 7 — No Tool Whitelisting

**What it is:** You use the default toolset (`type: agent_toolset_20260401`) without any `configs` array, which enables all 8 tools automatically.

**Why it hurts:** This is related to Mistake 2 but has a different root cause. The default behavior is full enablement. If you never think about it, your read-only research agent silently has `bash` and `write` available. You don't discover this until something goes wrong.

**How to fix it:** Always be explicit. Either use the whitelist pattern (start with `default_config: { enabled: false }` and enable what you need) or use the blacklist pattern (start with defaults and disable what you don't need). Never ship an agent without having made a conscious decision about each tool.

- [ ] I have explicitly reviewed and configured the enabled/disabled state of every tool in the agent toolset.

---

### Mistake 8 — No Error Handling in the System Prompt

**What it is:** Your system prompt describes the happy path but says nothing about what the agent should do when things go wrong: missing data, ambiguous input, tool errors, partial results.

**Why it hurts:** When Claude encounters an error and has no instructions, it improvises — and improvisation produces inconsistent results. Sometimes it silently skips the problem. Sometimes it hallucinates a plausible answer. Sometimes it stops entirely with an unhelpful message.

**How to fix it:** Add an explicit error-handling clause to the `<constraints>` section of your system prompt. At minimum:
- "If a required input is missing, stop and ask for it."
- "If a tool call fails, report the error in the output and continue with what is available."
- "If you encounter data you cannot verify, mark it as 'unverified' and proceed."

- [ ] My system prompt includes at least one instruction for each of: missing data, ambiguous input, and tool failure.

---

### Mistake 9 — Ignoring the Event Stream

**What it is:** Your event-handling code only processes `session.status_idle` with `stop_reason: end_turn`. You ignore `agent.tool_use`, `agent.message` (intermediate), and `session.error` events.

**Why it hurts:** You get no visibility into what the agent is doing. When something fails, you can't tell whether it was a tool error, a model confusion, a network issue, or an MCP auth failure. Debugging becomes guesswork.

**How to fix it:** Handle all relevant event types in your stream loop. At a minimum:

```python
for event in stream:
    match event.type:
        case "agent.message":
            # Log or display intermediate responses
            pass
        case "agent.tool_use":
            # Log which tool is being called and with what input
            pass
        case "session.error":
            # Log the error and decide whether to retry or abort
            pass
        case "session.status_idle":
            if event.stop_reason.type == "end_turn":
                break
            elif event.stop_reason.type == "requires_action":
                # Handle tool confirmations or custom tool results
                pass
```

- [ ] My event loop handles `agent.message`, `agent.tool_use`, `session.error`, and both `end_turn` and `requires_action` stop reasons.

---

### Mistake 10 — Building Multi-Agent Too Early

**What it is:** Before you have a working single agent, you start designing a coordinator agent with three specialist subagents because you read about it and it sounds powerful.

**Why it hurts:** Multi-agent orchestration (a research preview feature) is significantly more complex to build, debug, and operate than a single agent. Each subagent needs its own definition, the coordinator needs `callable_agents` configured, and you have to handle `session.thread_created`, `session.thread_idle`, and `session_thread_id` routing in your event loop. If your single-agent architecture isn't solid, adding more agents amplifies every existing problem.

**How to fix it:** Build the simplest agent that solves your problem. Only move to multi-agent when:
- A single agent's context window is a genuine bottleneck (not hypothetical)
- You have two or more clearly separated subtasks that benefit from truly independent context
- Your single-agent version is working reliably and you have a clear reason why it cannot be extended

The multi-agent constraint is strict: only one level of delegation is supported (coordinator → specialists; specialists cannot delegate further).

- [ ] I am starting with a single agent. I will revisit multi-agent only after the single-agent version is working reliably.

---

## Summary Scorecard

Go through this before your first real deployment. Every "no" is a risk you are accepting consciously.

| # | Check | Status |
|---|---|---|
| 1 | System prompt has role, goals, constraints, output format | [ ] Yes / [ ] No |
| 2 | Tool whitelist uses `default_config: { enabled: false }` | [ ] Yes / [ ] No |
| 3 | Memory decision made; store attached if needed | [ ] Yes / [ ] No |
| 4 | Using one session per conversation thread (not one per message) | [ ] Yes / [ ] No |
| 5 | Console prototype tested with 3+ inputs | [ ] Yes / [ ] No |
| 6 | Model chosen based on task complexity rubric | [ ] Yes / [ ] No |
| 7 | Every tool's enabled/disabled state explicitly decided | [ ] Yes / [ ] No |
| 8 | System prompt includes error-handling instructions | [ ] Yes / [ ] No |
| 9 | Event loop handles tool_use, errors, and both stop reasons | [ ] Yes / [ ] No |
| 10 | Starting with single agent; multi-agent deferred | [ ] Yes / [ ] No |