Know What a Prompt Costs Before You Send It

The first time most teams are surprised by an AI bill, they go looking for the mistake. They check for runaway loops, duplicate requests, endpoints that got called unexpectedly. Often there’s no mistake. The system ran exactly as intended. They just had no intuition for what the token counts would add up to.

This is a solvable problem. The math isn’t hard — models charge per token, token counts are estimable from text length, pricing is public. What’s missing is a fast way to check the numbers before you commit to sending a prompt in production.

Why token counts surprise people

The mental model most people carry is word count. A 500-word prompt feels like a 500-unit thing. But tokens aren’t words. A token is roughly four characters of English prose — so 500 words is closer to 650 tokens. For code, the ratio shifts again: code is denser with brackets, semicolons, and symbols that each tokenize separately, so the same character count in code produces more tokens than prose.

For a one-off prompt this doesn’t matter much. For a system prompt that runs with every API call, the difference between 1,000 tokens and 4,000 tokens is significant at scale. A system prompt that grew through iteration — as they always do — can easily become the largest cost driver in a production workflow without anyone noticing.

The cross-model cost gap is larger than it looks

The pricing spread across current frontier models is not subtle. Claude Opus 4.6 costs $15 per million input tokens. Claude Haiku 4.5 costs $0.80. GPT-4o mini costs $0.15. DeepSeek V3 costs $0.27.

That’s a 100× range from top to bottom. For a task that genuinely requires Opus — long-document synthesis, nuanced legal or medical reasoning, complex multi-step analysis — the premium is justified. For a task that amounts to classification or extraction on short inputs, using Opus is paying 100× for something Haiku handles equivalently.

The gap isn’t obvious until you put the numbers side by side for a specific prompt. Once you do, it’s usually clear which model is appropriate.

What the tool does

The Prompt Estimator at tools.modologystudios.com/prompt-estimator takes a text input and gives you two things: an estimated token count, and the input cost for that token count across eight models simultaneously.

The token count uses a simple heuristic — characters divided by four for prose, divided by three for code-heavy text — which is accurate enough for planning purposes. It’s not a tokenizer; it’s a fast approximation that gets you within a reasonable margin of the real number.

The cost table shows you Anthropic, OpenAI, Google, Meta, and DeepSeek models in one view, so you can see the full spread without opening five browser tabs.

It runs entirely in the browser. No API call is made. Nothing is sent anywhere.

How to use it in practice

The most useful time to run the estimator is when you’re finalizing a system prompt for a production workflow. Paste in the prompt. Look at the cost per call. Multiply by your expected daily volume. If the number is higher than you expected, look at whether the task actually requires the model you were planning to use.

It’s also useful when you’re iterating on prompt length — adding context, few-shot examples, formatting instructions. The live count updates as you type, so you can see immediately when you’ve crossed into a meaningfully more expensive tier.

This is not a substitute for running the actual tokenizer if precision matters. For billing-critical work, use the provider’s official token counter or the API’s reported usage. But for planning, comparison, and gut-checking before you commit to an architecture, the estimator is usually sufficient.

The goal is simple: no AI cost should be a surprise. The math is knowable before you send anything.