# Introduction
Claude Code is incredibly handy, but the costs can escalate surprisingly quickly. The reason is straightforward: you’re not just paying for the prompt you typed. More often than not, Claude also carries along previous messages, files it has accessed, tool outputs, memory documents like CLAUDE.md, and embedded instructions. So when token usage rises, it’s rarely just about poorly written prompts. The real culprit is usually overstuffed context.
A lot of the general advice floating around isn’t very actionable. Saying "keep chats brief" is accurate, but it doesn’t explain what actually makes a real difference. What truly helps is getting a clear picture of how Claude Code constructs its context, what gets repeatedly reloaded, and which parts of your routine silently pile on overhead. This guide breaks down 7 practical approaches to help you use Claude Code more efficiently without constantly fretting about expenses. Let’s dive right in.
# 1. Pick the Right Model Based on Task Difficulty
This one is straightforward but woefully underutilized. Not every job warrants your most expensive setup. With API-based pricing, Opus costs five times as much as Sonnet per token. On subscription plans, more powerful models eat through your allocation faster.
/model sonnet # Day-to-day tasks: writing tests, minor edits,
# explaining logic, refactoring
/model opus # Complex tasks: multi-file architectural choices,
# difficult cross-system debugging
/model haiku # Quick tasks: lookups, formatting, renaming,
# anything repetitive
Kick off every session with Sonnet. Move to Opus only when you truly need in-depth reasoning or complex restructuring. Drop down to Haiku for purely mechanical operations. You can also set the effort level directly with /effort. For simpler tasks, dialing down the effort level shrinks the model’s reasoning budget, which cuts down on output tokens.
# 2. Keep CLAUDE.md Concise and Targeted
A great way to conserve tokens is to stop repeating the same project-specific guidelines in every conversation. That’s exactly what CLAUDE.md is built for. It loads before Claude reads your code or processes your request — ahead of everything else. It stays active in the context window for the entire session and is never unloaded on demand or discarded. This means a 5,000-token CLAUDE.md eats 5,000 tokens on every single exchange, whether you send 2 messages or 200. So, store your lasting instructions there: how to run tests, which package manager to use, formatting standards, key architectural constraints, and which directories Claude should steer clear of. This slashes the repeated prompt overhead across multiple sessions.
It’s equally crucial to keep it slim. Don’t cram meeting notes, design history, or extensive implementation guides into it. CLAUDE.md works best as a compact reference sheet rather than a massive brain dump.
# 3. Offload Verbose Tasks to Subagents
This is one of the most effective strategies because it fundamentally changes how context grows. Subagents are independent Claude instances running in their own isolated context window. When a subagent runs, all its verbose output — file lookups, log dumps, lengthy reasoning chains — stays contained within that isolated space. Only a summary bubbles back up to your main conversation. This keeps your main thread dramatically cleaner. But this is also where a lot of common advice falls short. Subagents aren’t automatically more cost-effective. Community testing has shown that for tiny tasks — basic shell commands or quick git operations, for example — subagents can actually waste more due to the structural overhead of prompts, tool definitions, and additional round-trip calls. So the practical rule isn’t "always use subagents." It’s "use subagents when the clutter you save in your main context outweighs the startup cost."
# 4. Direct Claude to Specific Files and Line Ranges
One of the quickest ways to burn tokens is telling Claude to "look around the repository" when the issue is really confined to one or two files. The vaguer your request, the more likely Claude is to waste tokens opening multiple files, exploring wrong paths, and reconstructing context you could have provided upfront. Consider this example:
Original request:
"Look through the auth code and tell me what’s wrong."
Much better:
"Compare
src/auth/session.tslines 30–90 withsrc/api/login.tslines 10–60 and explain the mismatch."
The first version sounds natural but tends to trigger costly exploration.
Another helpful habit is to use plan mode before running expensive operations. Activate it with Shift+Tab. In plan mode, Claude produces a step-by-step plan without making any actual changes. You review the plan, trim anything unnecessary, then switch back to regular mode. This removes the single biggest source of token waste: trial-and-error execution — where Claude attempts something, hits an error, and iterates, burning tokens with every cycle.
# 5. Run /compact Proactively Rather Than Reactively
Claude can compact your session on its own, and you can also trigger /compact manually. But the timing matters more than you might think.
Once Claude has examined multiple files, executed commands, and explored a handful of dead ends, your session usually contains a lot of leftover material that’s no longer relevant. That’s the ideal moment to compact. Instead of hauling all that excess context into your next step, you condense the conversation once the key information is clear, then continue with a much lighter session.
A frequent mistake is waiting too long to use /compact. Many developers hold off until Claude starts losing track of context or displays a warning. By then, the session is already bloated, and the resulting summary isn’t as clean or useful. If you compact earlier — while the session is still "healthy" — the summary is considerably sharper. You preserve the important details, shed the noise, and prevent unnecessary tokens from weighing down every subsequent step.
# 6. Check /context Before You Start Optimizing
One of the most overlooked steps is just seeing what’s actually eating up your context. A lot of token waste feels baffling until you realize the expensive part might not be your visible prompt. It could be a large file Claude accessed earlier, accumulated tool output, a heavy memory file, or the overhead from additional tooling.
The /context command is your diagnostic microscope. Before overhauling your entire workflow, check what’s actually being loaded or repeatedly resent. In many cases, the biggest improvement isn’t about better prompting. It comes from identifying one "quiet offender" that has been tagging along in every exchange. That’s why blind optimization isn’t the best approach. First, inspect what’s sitting in your context. Then remove or trim down the parts that are genuinely causing the bloat.
# 7. Keep Your Tooling Setup Minimal
Claude Code can hook into many external tools and data sources, which is powerful — but more integrations also mean more context overhead once those tools get involved. If too many tools or plugins are active, the model can end up carrying around more overhead than the task actually requires. Keep your setup minimal. Use integrations that solve a genuine, recurring problem. Don’t overload Claude Code with every available skill just because it’s easy to add them.
# Final Thoughts
The smartest way to reduce Claude Code token usage isn’t to micromanage every prompt. It’s to structure your workflow so Claude only sees what it truly needs. The biggest savings come from controlling automatic context, narrowing the search scope, and ensuring noisy side work doesn’t bleed into your main session.
Stop focusing solely on prompts — start thinking about context design.
Kanwal Mehreen is a machine learning engineer and technical writer with a deep passion for data science and the intersection of AI with medicine. She co-authored the ebook "Maximizing Productivity with ChatGPT." As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.



