Context Windows Are the Constraint That Actually Matters
A novel fits in Gemini. A PDF overflows GPT-4o. A visualizer that makes the difference concrete before it breaks your application.
Context windows used to be a footnote. A few thousand tokens, a known limitation, something you worked around. Then models got bigger and the context windows grew with them — 8K to 32K to 128K to a million. Now the spread between the smallest and largest is so wide that context capacity is often the deciding factor in which model you can actually use for a task.
A million-token context window means something. A full novel is around 100,000 words — roughly 130,000 tokens. Gemini 2.5 Pro fits seven of them in a single context. GPT-4o and most other models cap at 128,000 tokens, which handles a long novel but runs out on large codebases or extensive document sets.
The practical consequence is that the model you can afford might not be the model you can fit your task into.
What fills a context window faster than you think
The intuition most people have about context size is that it maps to document length. Feed in a PDF, it takes some space. This is true as far as it goes, but it misses where context actually gets consumed in production.
System prompts are typically written once and then ignored. In a complex application, a system prompt might run to 3,000 or 5,000 tokens — a significant fraction of a 128K window, and a meaningless fraction of a million-token window. The choice of model changes how much room is left for the actual task.
Conversation history compounds. Each message in a multi-turn conversation adds to the context. A long customer support session, a code review thread, a document that’s being iteratively edited — these accumulate. At 128K tokens you’ll hit the ceiling of a lengthy session. At 200K you have more headroom. At a million, you’re unlikely to reach it in normal use.
Document grounding is where the largest numbers live. Retrieval-augmented generation that pulls entire documents rather than excerpts, codebases loaded wholesale for analysis, legal or financial datasets being processed in full — these can run into hundreds of thousands of tokens. At this scale, most models aren’t viable and the choice becomes Gemini or nothing.
Why it matters to know before you build
The failure mode for context overflow is usually not graceful. At best, the API returns an error that your application needs to handle. At worst, the model silently truncates the input and gives you an answer based on incomplete context, which is harder to detect and more damaging.
Building an application around a model’s context limit, then discovering that a common user scenario overflows it, is the kind of thing that requires architectural rethinking after the fact. It’s avoidable if the numbers are visible before you start.
The visualizer
The Context Window Visualizer at tools.modologystudios.com/context-visualizer maps token counts to context windows across eight models in one view.
You can set the token count from eight presets — short email, long document, ten-page PDF, small codebase, full codebase, novel, very large repo, maximum one million — or use the log-scale slider to set any value between 100 tokens and a million. The slider uses a logarithmic scale because the range is too large for a linear slider to be useful.
For each model, you get a fill bar showing what percentage of the context window that token count occupies, the remaining headroom in tokens, and a status indicator: fits, nearly full, or over limit.
The point isn’t precision — it’s making the relationship concrete. When you see that 100,000 tokens fills 78% of Claude’s context but 10% of Gemini’s, the architectural decision about which model to build around becomes a lot clearer.
The numbers shift. The thinking doesn’t.
Context windows will continue to grow. The models we’re tracking will be updated, replaced, or repriced. But the underlying question — can my task fit, and what’s left over for headroom — stays the same regardless of where the ceiling moves.
The constraint is real. Knowing where it is, for the specific token count of your specific use case, is the minimum information required to make a reasonable architectural decision.