Chapter 4: The Three Claude Models — Haiku, Sonnet, Opus
By the end of this chapter, you will know which Claude model to choose for any given agent task — and understand the real pricing differences so you can design cost-effectively from day one.
The Big Idea
Not every task needs the most powerful model. Using Opus for a task Haiku handles well is like sending a neurosurgeon to stitch a paper cut — technically capable, but expensive and unnecessary.
Claude Managed Agents supports three current model families, each with a different point on the capability/cost curve. Choosing the right one matters: the same task can cost five times more with Opus than with Haiku. For workflows that run many sessions or handle high volume, that multiplier compounds fast.
According to the models overview documentation, the current supported models for Managed Agents (all Claude 4.5 and later) are:
| Model | Claude API ID | Context Window | Max Output | Input / Output Pricing |
|---|---|---|---|---|
| Claude Opus 4.7 | claude-opus-4-7 |
1M tokens | 128k tokens | $5 / $25 per MTok |
| Claude Sonnet 4.6 | claude-sonnet-4-6 |
1M tokens | 64k tokens | $3 / $15 per MTok |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 |
200k tokens | 64k tokens | $1 / $5 per MTok |
All three support adaptive thinking. Sonnet 4.6 and Haiku 4.5 also support extended thinking. (Models overview)
The session runtime cost — separate from token costs — is $0.08 per session-hour for active runtime. (Anthropic pricing)
The Analogy
Think of hiring for a research project.
You could hire a research intern (Haiku) — fast, eager, handles structured tasks reliably, costs less. Perfect for data entry, summarizing documents, running defined workflows.
You could hire a senior analyst (Sonnet) — strong reasoning, handles ambiguity well, balances depth and speed. Good for most professional-grade work where you need judgment, not just execution.
You could hire a domain expert consultant (Opus) — the deepest expertise, best on complex multi-step reasoning and open-ended problems with no clear path. Worth the premium when the problem genuinely requires it; overkill when it doesn't.
The mistake most people make is defaulting to the consultant for everything. Prototype with Haiku. Upgrade to Sonnet when you need more judgment. Reserve Opus for tasks where complexity demands it and you've confirmed the simpler models aren't adequate.
How It Actually Works
Claude Opus 4.7 — The Deep Reasoner
API ID: claude-opus-4-7
Context window: 1M tokens
Max output: 128k tokens
Pricing: $5 input / $25 output per MTok
Opus 4.7 is described in the models overview as "Our most capable generally available model for complex reasoning and agentic coding, with a step-change jump over Claude Opus 4.6."
The key phrase: "step-change jump." This isn't marginal improvement — it's a different level of capability on hard problems. When your task involves:
- Multi-step code that requires architectural decisions
- Complex analytical work with many interdependent variables
- Open-ended research where the path forward isn't clear
- Tasks where errors have high downstream consequences
...Opus 4.7 is the right choice. The 128k token output limit also matters — it can produce longer, more complete outputs in a single turn than the other models.
Note on fast mode: The speed: fast option is available for Claude Opus 4.6 (not 4.7) with dedicated rate limits separate from standard Opus rate limits. If you need Opus-class reasoning with faster response time and can work with Opus 4.6 capabilities, this is an option. Pass the model as an object: {"id": "claude-opus-4-6", "speed": "fast"}. (Agent setup)
Claude Sonnet 4.6 — The Balanced Workhorse
API ID: claude-sonnet-4-6
Context window: 1M tokens
Max output: 64k tokens
Pricing: $3 input / $15 output per MTok
Sonnet 4.6 hits the sweet spot for the majority of production agent workflows. It handles:
- Content creation and editing
- Code generation for typical engineering tasks
- Data analysis and reporting
- Structured research with clear requirements
- Customer-facing agents where quality matters but deep reasoning isn't required
At roughly 60% of Opus's input cost and 60% of its output cost, Sonnet is the default recommendation for most production workloads. Unless you've tested with Sonnet and found specific capability gaps, start here.
Sonnet 4.6 supports both adaptive thinking and extended thinking, giving it access to reasoning chains for harder problems while still being faster and cheaper than Opus.
Claude Haiku 4.5 — The Fast, Economical Option
API ID: claude-haiku-4-5-20251001
Context window: 200k tokens
Max output: 64k tokens
Pricing: $1 input / $5 output per MTok
Haiku 4.5 is five times cheaper than Opus on input and five times cheaper on output. For the right tasks, that difference is irrelevant — Haiku does the job just as well.
Those tasks are:
- High-volume, structured pipelines where the task is well-defined
- Extraction and transformation tasks (parse this JSON, reformat this data)
- Classification and tagging
- Simple Q&A over structured documents
- Prototype development and testing (before you commit to a production model)
The context window is 200k tokens, compared to 1M for Opus and Sonnet. For most tasks, this is more than enough. If you're working with very large codebases or extremely long documents, Sonnet or Opus may be necessary.
Cache Pricing: The Hidden Saving
All three models support prompt caching, which the Managed Agents harness uses automatically. The cache pricing is:
| Model | Cache Write | Cache Read |
|---|---|---|
| Claude Opus 4.7 | $6.25/MTok | $0.50/MTok |
| Claude Sonnet 4.6 | $3.75/MTok | $0.30/MTok |
| Claude Haiku 4.5 | $1.25/MTok | $0.10/MTok |
Cache read costs are dramatically lower than input costs. The harness uses a 5-minute TTL for cache entries. For workflows with repeated context (a stable system prompt, a large codebase mounted at session start), the effective input cost can drop substantially. The session usage object tracks cache_creation_input_tokens and cache_read_input_tokens separately, so you can measure the actual savings.
Session Runtime Cost
Beyond token costs, every session incurs a runtime charge: $0.08 per session-hour for active runtime. (Anthropic pricing)
A session running for 30 minutes costs $0.04 in runtime. Running one agent session continuously for a month costs about $58 in runtime alone — before token costs.
This means:
- Short, focused sessions are more cost-efficient than long, wandering ones
- Sessions should be closed when work is complete rather than left running
- For high-frequency tasks, batch your requests into single sessions where possible
Choosing Your Model: A Decision Framework
Is the task well-defined and high-volume?
→ Haiku 4.5
Does the task require professional-grade judgment or balanced capability?
→ Sonnet 4.6 (default for most production agents)
Does the task involve complex multi-step reasoning, hard agentic coding,
or problems where you need maximum capability and cost isn't the constraint?
→ Opus 4.7
Are you prototyping or testing?
→ Start with Haiku, upgrade when Haiku fails the task
Adaptive Thinking vs. Extended Thinking
Both are thinking features available on the current model generation:
- Adaptive thinking is available on Opus 4.7, Sonnet 4.6, and Haiku 4.5 — the model dynamically adjusts how much reasoning it applies.
- Extended thinking is available on Sonnet 4.6 and Haiku 4.5 — it enables longer, deeper reasoning chains for harder problems.
Note: Extended thinking is listed as not available on Opus 4.7 in the models overview. Opus 4.7 supports adaptive thinking but not extended thinking. Sonnet 4.6 supports both.
This is a nuance worth knowing: for tasks where you want both maximum capability and extended thinking chains, Sonnet 4.6 with extended thinking may outperform Opus 4.7 for certain problem types.
Try It Yourself
Find the models overview page. Go to docs.anthropic.com/en/docs/about-claude/models/overview and look up the current models table. Confirm the API IDs match what's in this chapter.
Calculate the cost of your planned task. Estimate the input and output tokens your agent will use per session. (A good starting estimate: a typical system prompt is 500–2,000 tokens; a full codebase for a medium project might be 20,000–100,000 tokens; agent output per session might be 5,000–30,000 tokens.) Calculate the cost at Haiku, Sonnet, and Opus rates. Feel the difference.
Create an agent pinned to Haiku for prototyping:
ant beta:agents create \ --name "Prototype Agent" \ --model '{id: claude-haiku-4-5-20251001}' \ --system "You are a helpful assistant." \ --tool '{type: agent_toolset_20260401}'(Haiku ID verbatim from the models overview. Use this during development to keep costs low.)
Create a second agent using Sonnet (for comparison):
ant beta:agents create \ --name "Production Agent" \ --model '{id: claude-sonnet-4-6}' \ --system "You are a helpful assistant." \ --tool '{type: agent_toolset_20260401}'(You'll run the same task on both in Chapter 5 and compare outputs.)
Add a session runtime line to your cost calculation. For your planned workflow: how long will a typical session run? Multiply by $0.08/hour. Is this a meaningful cost for your use case?
Common Pitfalls
Defaulting to Opus for everything. Unless you've tried a simpler model and found it lacking, start with Haiku for prototyping and Sonnet for production. The 5x cost multiplier between Haiku and Opus adds up quickly.
Forgetting session runtime costs. Token pricing gets all the attention, but $0.08/session-hour is a real cost for long-running agents. A session that runs for 10 hours costs $0.80 in runtime — before any token costs. Design sessions to be task-focused and close them when work is complete.
Using the wrong API ID format. Model IDs must be exact.
claude-haiku-4-5-20251001has a date suffix that the other two don't. Double-check the models overview before coding.Not tracking per-session usage. The session object includes a
usagefield with cumulative token statistics. Fetch the session after it goes idle to read the actual cost. Without this, you're flying blind on real-world costs.Assuming the newest model is always best for your use case. Opus 4.7 is the most capable model generally. But for a well-defined, high-volume task, Haiku might have a 98% accuracy rate at 20% of the cost. Run both on a sample and measure before committing to a model.
Toolkit
Model Selection Cheat Sheet — Quick-reference card with all three model IDs, context windows, max output tokens, and pricing in a single table, plus the decision framework from this chapter.
Cost Calculator Template — A spreadsheet template to calculate monthly token and runtime costs across all three models based on your estimated usage.
Chapter Recap
- The three supported models are Claude Opus 4.7 (
claude-opus-4-7), Claude Sonnet 4.6 (claude-sonnet-4-6), and Claude Haiku 4.5 (claude-haiku-4-5-20251001). All are supported in Managed Agents. All require Claude 4.5 or later. - Pricing ranges from $1/$5 per MTok (Haiku) to $5/$25 per MTok (Opus) for input/output tokens. Cache read costs are dramatically lower. Session runtime costs $0.08 per active hour.
- The default recommendation: prototype with Haiku, run production on Sonnet, upgrade to Opus only when you've verified a simpler model isn't sufficient for the task. Cost discipline early compounds over time.