← All summaries

Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit.

AI News & Strategy Daily · Nate B Jones · April 2, 2026 · Original

Most important take away

The next generation of AI models (Claude Mythos, new ChatGPT, next Gemini) will be significantly more expensive because they are trained on costlier hardware, making token efficiency a critical job skill. Most users waste 8-10x more tokens than necessary through poor habits like feeding raw PDFs, sprawling conversations, and using expensive models for simple tasks — habits that are tolerable at current prices but will become very costly as frontier model pricing increases potentially 10x.

Chapter Summaries

The Cost Reality of Next-Gen Models

Next-generation models will be trained on Nvidia’s GB300 series chips, making them far more expensive. Jensen Huang has cited $250,000/year per developer as a real token cost figure. However, a production AI pipeline example showed that smart token usage can deliver fully personalized analysis across dozens of dimensions for under $0.25 per user on expensive models.

Beginner Mistake: Raw Document Ingestion

Dragging raw PDFs into conversations is extremely wasteful. A 4,500-word PDF can balloon to 100,000+ tokens due to formatting overhead (headers, footers, embedded fonts, layout metadata). Converting to markdown first reduces this to 4,000-6,000 tokens — a 20x savings that compounds as the conversation continues.

Intermediate Mistake: Conversation Sprawl

Going 20-30+ turns in a single conversation wastes tokens because the entire conversation history is resent with every turn. Instead, separate information-gathering conversations from focused work conversations. Gather research across multiple models and threads, then bring consolidated context into a clean session for the actual work.

Intermediate Mistake: Plugin and Connector Overload

Loaded plugins and connectors consume tokens before you even type. One person had 50,000+ tokens loaded before their first message. Audit your plugins regularly and remove ones you are not actively using — they are a silent tax on every conversation.

Advanced Users: System Prompt and Context Hygiene

Advanced users have the most leverage but also the most expensive mistakes, often at hundreds of thousands or millions of tokens. Prune system prompts regularly, remove legacy instructions from older model eras, and let improved model intelligence lean out the context window rather than carrying forward bloated prompts.

The Cost Math: Sloppy vs. Clean Usage

A sloppy 5-hour session with raw PDFs, 30 turns, and Opus for everything costs $8-10 in compute (800K-1M input tokens). The same work done cleanly — markdown conversion, fresh conversations every 10-15 turns, model tiering (Opus for reasoning, Sonnet for execution, Haiku for polish) — costs about $1 (100-150K input tokens). Across a 10-person team, that is $2,000/month vs. $250/month.

The “Stupid Button” Token Audit Tool

Nate built a diagnostic tool with six questions: (1) Are you feeding raw PDFs/images when you only need text? (2) When did you last start a fresh conversation? (3) Are you using the most expensive model for everything? (4) Do you know what loads in context before you type? (5) Are you caching stable context for API calls? (6) Are you handling web search efficiently? The tool includes a prompt, an audit skill, and guardrails for the Open Brain ecosystem.

Five Commandments for Agent Token Management

  1. Index your references — use retrieval to scope what the model sees, not full document dumps. 2. Pre-process context — documents should arrive ready to use, not ready to be read. 3. Cache stable context — system prompts, tool definitions, and reference material get a 90% discount on cache hits. 4. Scope each agent’s context to the minimum needed — a planning agent does not need the full codebase. 5. Measure what you burn — track per-call token cost, input/output tokens, model mix, and cost ratio.

Cultural Shift: Smart Token Usage

Burning tokens has become a badge of honor, but the goal should be efficient burning. Use your full token allocation, but spend it on meaningful, bold, creative work rather than wasting it on convertible PDFs and sprawling conversations.

Summary

Actionable Insights:

  • Convert all documents to markdown before feeding them to any LLM. This single habit can reduce token usage by 20x. Use free web converters, Claude itself, or tools like Open Brain’s plugin. Screenshots are also terribly inefficient — copy and paste text instead.

  • Start fresh conversations every 10-15 turns. Every turn resends the entire conversation history. Separate research/exploration sessions from focused work sessions, then bring consolidated findings into a clean chat.

  • Tier your model usage. Use Opus/top models for reasoning and complex tasks, Sonnet/mid-tier for execution, and Haiku/cheap models for formatting and polish. Do not use a Ferrari for grocery runs.

  • Audit and prune your plugins and connectors regularly. Each unused plugin silently consumes tokens on every conversation. Remove anything you have not used recently.

  • For API and agent builders: implement prompt caching immediately. Cache hits on Opus cost $0.50/million vs. $5/million standard — a 90% discount for stable content like system prompts, tool definitions, and reference docs. This is table stakes in 2026.

  • Use dedicated search tools (like Perplexity via MCP) instead of native LLM web search. This can save 10,000-50,000 tokens per search, run 5x faster, and return structured citations.

  • Prune system prompts for agents regularly. Remove legacy instructions from older model generations. As models get smarter, you can lean out context windows and trust the model to retrieve what it needs.

  • Instrument your token usage. You cannot optimize what you do not measure. Track per-session and per-call token costs, input vs. output tokens, and model mix.

  • Career advice: Token management is becoming a critical job skill. As models get more expensive, the ability to use AI efficiently will directly impact ROI. Jensen Huang’s $250K/year/developer figure makes this a business-critical competency, not a nice-to-have.

  • Prepare now for more expensive models. Claude Mythos and next-gen models are expected within 1-2 months with potentially 10x higher pricing. The wasteful habits tolerable today will become very costly at scale.