Frontier Equivalent Token: a unit of measure for multi-model AI consumption

One million tokens on DeepSeek V4-Flash cost $0.28. One million tokens on Claude Opus 4.8 cost $25. Is it the same "million"? For the budget yes, for the capability no — and today there is no unit that lets you add them up.

As of 8 June 2026, on the official pricing pages of OpenAI, Anthropic, Google, Mistral and DeepSeek, the price of one million output tokens ranges between $0.28 and $30: a factor of over 100×. The same word "token" identifies units whose market value differs by two orders of magnitude.

For an Italian SME with €5-50M revenue that has put three or four models into production — a frontier model for complex reasoning, a mid-tier model for drafting, a low-cost model for batch pipelines — this is not a theoretical detail. It is the reason why the "AI / LLM" line on the P&L becomes unreadable: end-of-month tokens cannot be summed except in dollars, and dollars do not describe what you actually bought. An intermediate unit is needed: the proposal is to call it the Frontier Equivalent Token (FET).

The Frontier Equivalent Token is an original proposal by Tomato Blue. We present it here as an open contribution to the AI cost governance debate: a unit of measure designed for those who, every day, have to reconcile the consumption of heterogeneous AI models into a single readable number.

The problem: a token is not a currency

"Number of tokens consumed" is today the default metric in every provider dashboard. It works as long as you stay on a single model. It becomes misleading the moment an organisation uses several in parallel.

Three practical consequences:

Non-comparable budgets. A team that consumes 500M tokens per month on a budget model and a team that consumes 20M on a frontier model look like two different worlds; in economic value they may be much closer than they appear.
Model-mix decisions taken "by feel". Without a single metric, the choice of which model to assign to which task is made on the lead engineer's intuition, not on cost-value analysis.
AI reporting that is hard to defend. An AI Governance Officer who tells the CFO "I consumed X tokens" is not saying anything: the quantity has no economic content and no capability content.

The proposal: Frontier Equivalent Token (FET)

The idea is simple. Choose a frontier model as the benchmark. By definition:

1 token of the benchmark model = 1 FET

For every other model, the number of FET is obtained by multiplying the tokens actually consumed by the ratio between the output price of the model and the output price of the benchmark:

FET = Tokens × ( Price_model / Price_benchmark )

The price is meant per million output tokens, on the provider's public tariff.

The underlying assumption is that the market price of a token reflects the value the provider attributes to it. It is not an absolute truth — we will return to the limits — but it is a reasonable proxy, because the price embeds perceived quality, compute cost, market demand and resource scarcity.

In this article we use GPT-5.4 as the benchmark ($15/M output tokens). The choice is arbitrary: each organisation can pick the frontier model it actually uses as "first choice", as long as it then keeps the choice over time to preserve month-on-month comparability.

The normalisation table as of 8 June 2026

All prices have been verified 1:1 on the providers' official pricing pages (links at the end of the article). Coefficients computed with GPT-5.4 = 1 FET.

Model	Output $/M	FET per token
GPT-5.5	30.00	2.0000
Claude Opus 4.8	25.00	1.6667
GPT-5.4 (benchmark)	15.00	1.0000
Claude Sonnet 4.6	15.00	1.0000
Gemini 2.5 Pro ¹	10.00	0.6667
Claude Haiku 4.5	5.00	0.3333
GPT-5.4-Mini	4.50	0.3000
Mistral Large 3	1.50	0.1000
DeepSeek V4-Flash	0.28	0.0187

¹ For prompts above 200k tokens, Gemini 2.5 Pro moves to $15/M output: the coefficient becomes 1.0000. The metric should be applied by actual usage band.

The reading is immediate. Above 1 FET sit the models the market values more than the benchmark (GPT-5.5, Opus 4.8). Below 1 FET sit all the others, down to a factor of 50× (V4-Flash). One DeepSeek V4-Flash token is worth 0.0187 FET: to obtain the economic equivalent of one million GPT-5.4 tokens, you need roughly 53.5 million.

What changes in practice: the multi-model example

An SME running LLMs in production may end the month with consumption like this:

Model	Tokens consumed	FET equivalent
GPT-5.5	20M	40M
Claude Sonnet 4.6	50M	50M
DeepSeek V4-Flash	500M	9.3M
Total	570M	~99.3M

In raw tokens, DeepSeek is by far the largest consumer (88% of the total). In FET, it counts for less than 10%. The system has consumed the economic equivalent of about 100 million GPT-5.4 tokens, and from that quantity you can derive monthly cost, trend and forecast — independent of the mix.

It is the same logic that, in finance, converts foreign currencies into a reference currency before summing them: no one would try to add yen, dollars and Swiss francs without an exchange rate.

The limits: when price is not value

The assumption "price = value" does not always hold. The situations where FET under- or over-estimates real value are predictable:

Competitive dumping. A provider selling below cost to acquire users distorts the coefficient of its own model downwards. In this case the real value of the token is higher than the normalisation suggests.
Self-hosted proprietary models. Without a market price exposed, the numerator of the formula is missing. A possible workaround: estimate an equivalent price by building the TCO on internal inference cost.
Price drift. Prices change with a quarterly frequency on average. The normalisation table should be recomputed at least every quarter, and historical coefficients should be frozen in past reports to avoid rewriting history.

Above all: FET measures the implicit economic value according to the market, not the model's intelligence and not its performance on your specific task. For that you need internal benchmarks, and no unit metric can replace them.

Prior art: what already exists

Two nearby pieces of work, both distinct from FET:

Frontier Equivalent Compute by Epoch AI normalises compute, not tokens: it defines 1 H100e as the peak power of a NVIDIA H100 and measures the capacity of frontier data centres in equivalent H100s (Epoch AI methodology). FET applies the same idea — an explicit benchmark, conversion by ratio — to the price per output token rather than to FLOPs.
Artificial Analysis blended price computes a unit price per LLM as a 7:2:1 weighted average of cache_hit:input:output, expressed in OpenAI tokens (methodology). The goal is comparative benchmarking; FET is instead a transactional unit for internal aggregation of corporate consumption, based on output alone and with a benchmark explicitly chosen by the user.

What to do now

Five concrete moves for those who want to experiment with the metric.

Inventory the monthly mix. Pull from provider dashboards the per-model output token consumption for the last 30 days. Without a baseline, any metric is a theoretical exercise.
Pick the benchmark. The frontier model you actually use as "first choice" is the natural pick. Document the choice and freeze it: changing it breaks historical comparability.
Convert and sum. Apply the price_model / price_benchmark coefficient to each consumption. The total is monthly consumption in FET.
Re-express the budget in FET. Bring to the board, AI report and P&L the consumption in FET instead of raw tokens or euros. Force the conversation onto equivalent value, not provider.
Re-check the table every quarter. Update the coefficients, recompute current totals, freeze historical ones. The drift in coefficients tells how the market is moving — and is sometimes worth more than the headline number.

Conclusion

The debate on AI cost governance is in its infancy: most companies running LLMs in production today lack a shared grammar to talk about consumption, and the three numbers they have — tokens, euros, models — are not fungible.

FET is a minimal proposal: a benchmark, a price ratio, a sum. It does not solve the problem of model quality, it does not replace accuracy benchmarks, it does not guarantee that the market is right. It does one thing: it lets you sum heterogeneous consumption in a readable unit. If your organisation uses several models, try expressing the monthly budget in FET for the next 60 days and see whether mix decisions change.

This is exactly what makes FET an operational tool, not just a conceptual one: it enables correct accounting of AI costs while abstracting away from the specific model. The budget stops being hostage to the provider or to the current mix and becomes a stable quantity, comparable over time and defensible in front of the CFO. With this tool, Tomato Blue helps its clients manage their AI budget correctly: from measuring multi-model consumption to defining thresholds, forecasts and mix policies aligned with business goals.

Are you designing or scaling an AI solution?

Get in touch to design the compliance and business model of your AI solution together — from cost governance with metrics like FET to regulatory conformity.

Sources

Official pricing pages (verified 8 June 2026):

Prior art: