Navigating Claude Code: Models, Tiers, and Effort

Previously in series

Introduction

Claude Code ships with a default model and a default effort level, and most developers never change either. That is not necessarily wrong — the defaults are reasonable — but it does mean you are making a cost and quality decision without knowing you made one. Once you understand what the model tiers and effort levels actually control, you can start routing work more deliberately.

This article covers the three model tiers available in Claude Code, the two execution modes, the opusplan alias, the four effort levels, and the concrete situations where changing any of these makes a real difference.

The model tiers

Claude Code exposes three model tiers, each with a distinct speed-to-capability tradeoff:

Haiku (4.5) is the fast tier. It handles lookups, simple renames, grep-style exploration, and anything where you mostly need the model to execute rather than reason. Response times are near-instant. If you are using subagents to explore a codebase before doing the real work, Haiku is a natural fit for that role.

Sonnet (4.6) is the default for most workloads and the model Claude Code starts on unless you change it. It handles general coding tasks well — writing functions, small refactors, reviewing files, explaining code. It is now the baseline across Claude.ai and Claude Cowork.

Opus (4.6) is the flagship. It is significantly more capable on complex reasoning, architectural decisions, and hard debugging problems. It also costs more tokens per session. On Max, Team, and Enterprise plans, Opus automatically gets the 1 million token context window with no configuration — that 1M window is what makes loading an entire large codebase in a single session actually viable. On Pro, it does not come on automatically; you opt in with /extra-usage inside Claude Code.

You switch models inside a session with /model, or you set the default in your settings.json — the same file covered in the first article for permissions and basic configuration:

{
  "model": "sonnet"
}

Use aliases (sonnet, opus, haiku) or full model names (claude-opus-4-6). The --model flag at the command line takes precedence over the settings file for that session.

Normal mode and plan mode

By default, Claude Code operates in normal mode — it can read files, run commands, and edit code, and it asks for your confirmation before each action. Two other modes are available, selectable by pressing Shift+Tab to cycle through them.

Plan mode restricts Claude to read-only tools only. It can explore your codebase, search for patterns, and draft an implementation approach, but it cannot write or edit anything until you approve. This is not just a soft guideline — the restriction is enforced at the tool level. Claude genuinely cannot touch your files while in plan mode. You can also activate it with /plan or --permission-mode plan at startup.

When you are about to make a change that spans multiple files, or when you are working in an unfamiliar area of a codebase and want Claude to actually understand the structure before it starts editing, plan mode is the right call. The overhead is real — you spend tokens on exploration before any code gets written — but it is usually fewer total tokens than recovering from a refactor that touched the wrong things. How much those exploration tokens actually cost, and why they matter more than most developers expect, is something I will get into in the next article on the context window tax.

Auto-accept mode is the opposite: Claude acts without prompting for confirmation on each step. It is useful for well-understood mechanical work, but I do not use it often. The default mode, where Claude asks before each action, is the right balance for most tasks.

The `opusplan` alias

There is a fourth model option in the /model picker that is not one of the three tiers — it is the opusplan alias. When active, Claude uses Opus during plan mode and automatically switches to Sonnet when execution begins. The idea is that you get Opus-level reasoning for the architectural thinking, and Sonnet’s efficiency for the actual code generation.

In practice, this is a reasonable default for sessions where you expect to spend significant time in plan mode before implementing. The planning phase, where you want deep analysis, runs on Opus. The implementation phase, which is more mechanical, runs on Sonnet.

One thing worth knowing: opusplan does not automatically get the 1M context window the way a plain opus selection does on Max, Team, and Enterprise plans. There are open bug reports on the Claude Code GitHub repo confirming this — selecting opusplan has shown the 200K window in /context even on accounts where Opus normally gets 1M. If you are working in a large codebase and need the full window, use opus[1m] explicitly instead. The opusplan alias is convenient for its automatic model switching, but you are currently trading the guaranteed 1M context to get it.

The effort levels

Effort is a newer control surface that arrived with the Claude 4.6 model family. It governs how much “thinking” the model does before it responds — specifically, how many tokens it can spend on internal reasoning before producing output. There are four levels:

low — minimal reasoning, fastest response. Good for tasks where the answer is obvious: fixing a typo, running a build command, renaming a symbol.
medium — the current default for Opus 4.6 and Sonnet 4.6. Handles the majority of everyday coding work well. Anthropic changed Opus 4.6 to default to medium effort in v2.1.68, which caused some community friction because the previous default was high.
high — deeper reasoning. Worth reaching for on complex debugging, multi-file refactors, architectural questions, or anything where you have noticed medium producing shallow results.
max — available only on Opus 4.6. Removes the token ceiling on thinking entirely. Responses are slower and more expensive, and the setting does not persist across sessions unless you pin it with the CLAUDE_CODE_EFFORT_LEVEL environment variable.

You set effort with /effort followed by the level, or use the --effort flag when launching:

claude --model claude-opus-4-6 --effort high

You can also pin it in settings.json — but max is not accepted there, only low, medium, and high. For max, you need the environment variable:

export CLAUDE_CODE_EFFORT_LEVEL=max

There is also a per-turn shortcut: include ultrathink in your prompt and Claude bumps effort to high for that one response, then reverts. Anthropic removed this keyword briefly, the community pushed back, and it came back in v2.1.68.

How they interact

Model and effort are independent axes. A rough way to think about it:

The cells at the extremes are almost never the right choice. Haiku at max effort is paying for reasoning the model tier cannot fully use. Opus at low effort is paying for a flagship model and telling it not to think.

The practical pattern I have settled into: Sonnet at medium for routine work, Opus at high when I hit something genuinely difficult. I reach for /effort max a few times a week — never as a default, only when high is not cutting through.

Common pitfalls

Using Opus for everything

If you are on Pro ($20/month), you share your token budget between Claude Code and the regular Claude interface. Running Opus at high effort for every task — including file renames and trivial edits — will burn through that budget in a way that Sonnet at medium would not. The quality difference for routine work is negligible. The token cost difference is not.

Treating ultrathink as a permanent mode

I see this sometimes: developers add ultrathink to their CLAUDE.md or to every prompt because they want maximum reasoning all the time. It does not work that way — ultrathink only affects the next response, then reverts. Adding it to CLAUDE.md is a particularly common mistake, since CLAUDE.md is loaded as context on every session and it is tempting to treat it as a place for persistent behavioral instructions. But the model reads it, notes the word, and applies high effort for that one turn of processing the file. After that, the session reverts to whatever effort level is configured. Reserve ultrathink for the moments that actually need it, and keep CLAUDE.md focused on project context rather than runtime behavior flags.

Not pinning the model in subagent frontmatter

Subagents inherit the session model by default. If your session is running on Opus at high effort — which is a reasonable setup for complex work — every subagent you spawn inherits that too, including the ones doing simple codebase exploration that Haiku would handle equally well. Because subagents feel like background workers, it is easy to forget they carry the same cost profile as your main session. The fix is one line in the subagent definition:

---
model: haiku
effort: low
---

The cost difference between an Opus-at-high exploration subagent and a Haiku-at-low one is significant. The quality difference for grep-style work is negligible.

High effort causing overthinking on simple tasks

More effort always sounds better. In practice, Anthropic’s own documentation warns that higher effort levels can cause the model to overthink routine work — producing slower, more hedged, more verbose output on tasks that needed a direct answer. I have seen this on simple refactors where high effort had Claude producing lengthy analysis of tradeoffs nobody asked for, followed by a timid implementation. Medium is the recommended default for a reason. Reserve high and max for the tasks that genuinely need deeper reasoning, not as a permanent session setting.

Effort not persisting across model switches

Effort is only supported on Opus and Sonnet — Haiku does not support it. If you set /effort high for a complex task, then switch to Haiku for a quick lookup, the effort slider disappears silently. Switch back to Sonnet and the session reverts to the model default, not your previous setting. The effort level you configured is gone. If you are switching models mid-session, check your effort level after each switch.

Tips and tricks

Check your current effort level before a complex task with /model — the effort level is shown next to the model name, and you can change both from that screen using the arrow keys.
Set effort per project, not globally, if different codebases have different complexity profiles. A large legacy monolith usually benefits from higher effort than a small new service where you know the patterns well.
Use ultrathink for one-off hard problems without changing your session default. Type it in the prompt, get deeper reasoning for that response, move on.
Set effort in subagent frontmatter if you use subagents for specific tasks. An exploration subagent running on Haiku at low effort is a very different cost profile from a code-review subagent on Sonnet at high. Both have their place — but they should be deliberate choices.
Watch the token budget display on Pro plans. The warning messages appear before you hit the limit, not after.

Conclusion

The model tier controls how capable the reasoning is. The effort level controls how much of that capability gets applied to your prompt. They are independent, and matching both to the task is what separates deliberate usage from burning tokens on autopilot.

Both dimensions also have a direct cost in context — thinking tokens count against the context window just like input and output tokens do. In the next article, I will cover exactly what consumes the context window in a Claude Code session, why long sessions degrade, and how compaction works.