Updated May 2026

Save tokens before your
AI session hits the wall.

Stop wasting context, credits and premium-model usage on work that does not need it. Tokenkarma helps heavy AI users see how much usage they have left across the AI tools they pay for, directly from the Mac menu bar.

Foundation

The 5 hidden ways you waste tokens

Most token waste does not come from one long answer. It comes from repeated context, messy conversations, overpowered models, unnecessary reasoning and invisible tool overhead.

01

Repeated context

Every time you paste the same brief, document, codebase summary or client context again, you spend tokens again. Use projects, memory, files, reusable instructions or cached context whenever possible.

02

Long-running conversations

A chat that keeps growing becomes expensive and harder to steer. When the topic changes, start fresh or summarize the useful state before continuing.

03

Reasoning tokens

Thinking modes are powerful, but they are not free. Use deeper reasoning for architecture, strategy, debugging and synthesis. Do not use it for rewriting a short email, formatting a list or naming files.

04

Tool and agent overhead

Coding agents, MCP servers, plugins, files, test logs and tool outputs can add hidden context. If an AI coding tool feels expensive, inspect what it keeps in context before blaming the model.

05

Premium model drift

Many users slowly start using the strongest model for everything. That is usually the fastest way to burn through limits. Keep your best model for the work where quality actually matters.

Save tokens on Claude

Claude is strong for long-form work, research, writing, analysis and project-based workflows. The main risk is simple: you keep repeating context Claude could have reused.

Quick wins

  • Use Projects for recurring work.
  • Put stable instructions in project instructions instead of pasting them every time.
  • Upload reusable files once instead of attaching them again and again.
  • Start a new chat when the topic changes.
  • Use faster models for simple tasks and stronger models for deep analysis.
  • Ask for a compact summary before a conversation becomes too long.

Best Claude workflow

Use Claude Projects as your "context home." If you work on the same client, product, document set or research topic every week, do not rebuild the context from scratch.

A good Claude Project should include:

  • The goal of the project.
  • The tone or output format you expect.
  • The few reference files Claude really needs.
  • A short explanation of what should not be repeated.
  • A clear instruction to ask before producing very long outputs.

Copy-paste prompt

Before answering, use only the context that is still relevant to this task.
Ignore old details from this conversation if they are not needed.
Give me the shortest useful answer first, then ask if I want a deeper version.

When to start a new Claude chat

Start fresh when:

  • You changed client, task or goal.
  • The conversation contains old decisions that are no longer true.
  • Claude starts referencing outdated context.
  • You are asking for a simple task after a long research session.
  • The answer feels slower, less focused or too influenced by earlier messages.
Most impactful Claude tip

Do not use one endless conversation as your workspace. Use Projects for stable context, new chats for new tasks, and summaries for handoffs.

Save tokens on Claude Code

Claude Code can burn usage quickly because it does not only read your prompt. It may also keep files, logs, tool results, previous commands, project instructions and MCP context in the session.

Quick wins

  • Use /usage to see what the session is consuming.
  • Use /clear when switching tasks.
  • Use /compact before a long session becomes messy.
  • Keep CLAUDE.md short and focused.
  • Move rare procedures into skills instead of loading them every time.
  • Disable MCP servers you do not need for the current task.
  • Avoid dumping huge logs unless the agent really needs them.
  • Use deeper thinking only for hard architecture, debugging or planning.

Keep CLAUDE.md lean

Your CLAUDE.md should not be a full internal wiki. It should be a map.

Good CLAUDE.md content:

  • How to run tests.
  • How to lint or typecheck.
  • Project structure overview.
  • Naming conventions.
  • Critical safety rules.
  • Where to find deeper docs.

Bad CLAUDE.md content:

  • Full product specs.
  • Long historical decisions.
  • Every edge case ever found.
  • Huge examples.
  • Repeated instructions that only apply to one task.

Copy-paste prompt

Before editing, inspect only the files needed for this task.
Do not scan the whole repository unless you can explain why.
Keep test output compact and summarize failures instead of pasting long logs.

Use /clear aggressively

The easiest Claude Code mistake is continuing a new task inside an old session. If you just finished a bug fix and now want to refactor a different area, clear the session first.

Use this pattern:

  1. Finish the current task.
  2. Ask Claude Code for a final summary.
  3. Save anything important in the repo or notes.
  4. Run /clear.
  5. Start the next task with a tight prompt.
Most impactful Claude Code tip

The best way to save usage in Claude Code is not to write shorter prompts. It is to control what the agent is allowed to carry into the next step.

Save tokens on ChatGPT

ChatGPT is often used as a general-purpose workspace: writing, analysis, coding, research, brainstorming and planning. That makes it easy to waste usage by repeating your preferences, files and context across chats.

Quick wins

  • Use Projects for recurring work.
  • Use custom instructions for stable preferences.
  • Use temporary chats for one-off tasks you do not want in memory.
  • Keep Thinking models for hard problems.
  • Use faster/default models for simple drafting, formatting and ideation.
  • Ask for "answer first, reasoning only if needed."
  • Split heavy research from final writing.

Best ChatGPT workflow

Use three layers:

  • Custom instructions for your global preferences. Example: tone, formatting, language, how concise you want answers to be.
  • Projects for recurring work. Example: one project for your startup, one for a client, one for SEO research, one for coding.
  • Fresh chats for isolated tasks. Example: rewrite a paragraph, generate ten title ideas, clean a CSV column, summarize one short document.

Copy-paste prompt

Give me the useful answer directly.
Do not over-explain unless I ask.
If this task does not require deep reasoning, keep the response compact.

When to avoid Thinking mode

Do not use a Thinking model for:

  • Short rewrites.
  • Simple summaries.
  • Title ideas.
  • Formatting.
  • Translation drafts.
  • Basic code snippets.
  • Email polishing.

Use Thinking for:

  • Complex debugging.
  • Strategic decisions.
  • Technical architecture.
  • Long synthesis.
  • Multi-step reasoning.
  • Comparing tradeoffs.
Most impactful ChatGPT tip

Stop re-briefing ChatGPT manually. Put stable context in Projects or custom instructions, then keep each chat focused on one job.

Save tokens and credits on Codex

Codex is not only a chat model. It is an agent working inside a code context. That means your usage depends heavily on task scope, repository size, instructions, tools and how much output the agent needs to inspect.

Quick wins

  • Keep AGENTS.md short.
  • Scope the task to one feature, bug or folder.
  • Tell Codex which files matter.
  • Avoid asking it to "inspect the whole repo" unless needed.
  • Limit MCP servers and external tools.
  • Use smaller models for routine tasks when available.
  • Ask for compact diffs and short explanations.
  • Separate planning from execution.

Good Codex prompts are scoped

Weak prompt:

Improve this app.

Better prompt:

In the billing settings page, fix the empty state when a user has no invoices.
Only inspect the billing route, invoice components and related tests unless you find a direct dependency.
Return a short summary and the files changed.

Keep AGENTS.md useful, not huge

A strong AGENTS.md should answer:

  • How does this repo work?
  • How should Codex run checks?
  • What files or folders should it avoid?
  • What conventions matter most?
  • What counts as done?

It should not include every product detail, every process document or every example you have ever written.

Copy-paste prompt

Keep the context narrow.
Before reading extra files, explain why they are needed.
Prefer a small targeted change over a broad refactor.
Summarize long outputs instead of repeating them.
Most impactful Codex tip

Codex gets expensive when the task is vague. The tighter the scope, the less context it needs and the better the result usually is.

Save tokens on Gemini

Gemini is often used for large context, multimodal work, research and app/API workflows. The main trap is assuming a huge context window means you should always fill it.

Quick wins

  • Use the smallest context needed.
  • Do not upload huge files when a relevant excerpt is enough.
  • Use Fast models for routine work.
  • Save Pro or Thinking modes for high-value reasoning.
  • Restart or summarize very long chats.
  • Use cached context for repeated API workflows.
  • Reduce image or video resolution when high detail is not needed.

Large context is not free discipline

A large context window is useful when the model truly needs the whole document set. But many tasks only require:

  • The relevant section.
  • A summary of the source.
  • A table of key facts.
  • A few examples.
  • A narrow instruction.

Before sending a huge file, ask yourself: "Does Gemini need all of this to answer well?"

Copy-paste prompt

Use only the parts of the uploaded material that are relevant to this question.
If the full file is not needed, ignore the rest.
Give me a compact answer and list any missing context only if it blocks the task.

Multimodal usage tip

Images, video and audio can consume a lot of tokens. If your task does not require small visual details, use lower resolution, shorter clips or a screenshot of the exact part you want analyzed.

Most impactful Gemini tip

Do not treat the large context window as a dumping ground. Treat it as a workspace, and only bring in the material the model needs for the next decision.

Decide

Which AI subscription should you upgrade?

Before paying for a higher plan, identify what you are actually running out of.

If you run out during long chats

You probably have a context-management problem. Try:

  • Starting fresh more often.
  • Summarizing before continuing.
  • Moving stable context into projects or files.
  • Keeping each chat focused on one job.

If you run out during coding sessions

You probably have an agent-context problem. Try:

  • Reducing repo scope.
  • Cleaning agent instructions.
  • Limiting tools and MCP servers.
  • Clearing sessions between tasks.
  • Keeping logs short.

If you run out during deep analysis

You may need stronger limits, but first check whether every step really needs a premium reasoning model. Try:

  • Drafting with a faster model.
  • Using reasoning only for the hard part.
  • Splitting research, synthesis and final writing.
  • Saving premium models for the final decision.
If you pay for multiple AI tools

This is where Tokenkarma is useful. Each tool has different limits, reset windows and usage rules. Without visibility, you only discover the limit when your work is already interrupted.

Tokenkarma is built for people who pay for several AI tools and want to know what they have left before starting work.

Join Founders List
Before you start

A simple token-saving checklist

Use this before starting a heavy session.

  • What is the exact task?
  • Which model is the cheapest good-enough option?
  • Does the model need the full context or just an excerpt?
  • Is this a new topic that deserves a fresh chat?
  • Can stable context live in a project, memory, file or instruction?
  • Am I using deep reasoning because I need it, or because it is the default?
  • Are tools, MCP servers, logs or files adding hidden context?
  • Do I know how much quota I have left before I start?
FAQ

Frequently asked questions

What does "save tokens" actually mean?

It means reducing the amount of unnecessary context, reasoning, files, tool output and repeated instructions your AI tool needs to process. For subscription users, it also means avoiding wasteful usage that pushes you into limits faster.

Is writing shorter prompts always better?

No. A short vague prompt can waste more usage than a clear detailed prompt. The goal is not to write the fewest words. The goal is to provide only the context the model needs.

Should I always use the fastest model?

No. Use faster models for routine work and stronger models for high-value reasoning. The mistake is using your strongest model for everything by default.

Why do coding agents use so much context?

Because they often inspect files, read instructions, run commands, process logs, call tools and carry previous steps forward. The prompt is only one part of the total context.

Can Tokenkarma reduce my token usage automatically?

Tokenkarma is designed to help you see your AI usage limits before you hit them. The tips on this page help you change your workflow. Together, visibility and better habits make it easier to avoid wasted sessions.

Who is Tokenkarma for?

Tokenkarma is for people who use multiple AI tools every day and pay for more than one subscription: consultants, writers, marketers, developers, translators, researchers, analysts and operators.