AI inference cost / Formula guide

AI Cost Per Token: When Token Price Helps and When It Misleads

Short answer: AI cost per token is useful for API estimates, but it can mislead when output length, retries, multi-step workflows, failed calls, or fixed serving capacity dominate cost.

Decision rule

Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.
Verify current provider pricing directly before buying or migrating.

Next action

Connect token price to product cost

Token price matters most after output length, retries, failed calls, traffic mix, and shared serving overhead are included.

Compare monthly cost

By Andrew Cooper, Founder of RunPlacement Updated May 2026 Provider-neutral, estimate-labeled guidance Verify current provider pricing

Right fit

You are estimating API spend from prompt and output size.
A teammate is using token price as the whole cost comparison.
You need to connect token usage to product margin or monthly budget.

Quick checks

Estimate input and output tokens separately.
Include retries, tool calls, multi-step chains, and failed calls.
Measure successful requests rather than all attempts.
Check whether fixed managed or GPU capacity changes the denominator.

Rough math

API estimate = input tokens / 1,000,000 * input price + output tokens / 1,000,000 * output price.
Request-level cost = expected input cost + expected output cost + retry and workflow allowance.
Effective serving cost = total monthly serving cost / successful requests.

Red flags

Only input tokens are counted.
The estimate ignores long outputs.
Failed attempts are treated as if they created user value.
Token price is used to dismiss warm GPU, storage, network, or ops cost.

What to do next

Use the calculator with your current token prices and request volume.
Use inference cost per request to normalize the result.
Use AI cost optimization if output size, retries, or routing are the main drivers.

Related resources

Use a worksheet before making the call

These supporting pages turn the decision into fields a buyer, engineer, or founder can actually compare.

AI inference costAI Inference Cost ChecklistChecklist / 8 sections / source-linked

A practical checklist for estimating AI inference cost across APIs, managed inference, self-hosted GPUs, batch jobs, realtime endpoints, and hybrid routing.

Related decisions

Keep narrowing the placement question

Follow the adjacent pages when the first answer exposes a deeper cost driver or operating constraint.

AI inference costInference Cost Per Request: Simple FormulaFormula

A useful inference cost per request starts with total monthly serving cost divided by successful inference requests, with failed calls and retries handled explicitly.

AI inference costAI Cost Optimization: Practical Levers Before Rebuilding InferenceOptimization guide

AI cost optimization usually starts with usage shape: reduce avoidable output, retries, failed calls, over-large prompts, expensive routing, and low utilization before changing infrastructure.

AI inference costLLM API Bill Too High? What to Check FirstCost triage

A high LLM API bill is usually a triage problem first: check whether output size, retries, tool calls, caching gaps, routing, or batchable work are driving the increase.

AI inference cost

When the GPU question is really serving cost

Use these pages when the same GPU quote, idle-cost, or useful GPU-hour question is about production inference rather than one-off training.

Estimator landing pageAI Cost CalculatorStart with broad AI cost, then narrow to API, managed inference, GPU, batch, realtime, or hybrid serving. Interactive calculatorAI Inference Cost CalculatorCompare API, managed inference, and self-hosted GPU cost per successful request. Decision pageAI Cost ComparisonCompare serving categories before ranking providers or quotes. Decision treeAPI vs Self-Hosted InferenceDecide when API simplicity, managed serving, or self-hosted GPU control fits. Optimization guideAI Cost OptimizationCheck output length, retries, routing, caching, batching, and utilization before rebuilding inference. Triage pageAI Costs IncreasingFind the driver before moving off APIs, switching platforms, or buying GPUs. Research guideRealtime vs Batch ResearchDecide when queueing, delay tolerance, and avoided warm capacity can change inference cost. Formula pageInference Cost Per RequestUse monthly serving cost divided by successful requests as the common comparison unit. FrameworkAI Inference Cost ModelNormalize serving options by monthly cost and successful requests.

Framework

Use the underlying decision model

These framework pages define the terms and formulas behind this specific decision.

AI inference costAI Inference Cost ModelAI inference cost

AI inference cost should be compared as effective cost per successful request and monthly serving cost, not just token price or GPU hourly rate.

GPU pricingUseful GPU-Hour Frameworkuseful GPU-hour

Useful GPU-hour cost is the better comparison unit when GPU providers differ in utilization, queueing, reliability, storage behavior, or operational model.

AI inference cost quiz

Get an AI compute cost read

Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.

Uses actual request volume, latency, GPU need, data movement, priority, and ops tolerance.

Start the AI compute read

FAQ

Is cost per token the same as cost per request?

No. Cost per request depends on the number of input tokens, output tokens, retries, failed calls, and workflow steps used by each successful request.

Why can output tokens dominate AI API cost?

Output tokens often have a separate price and can grow with verbose responses, chain-of-thought-like drafts, tool summaries, or unbounded generation.

Can token price compare API and self-hosted GPU?

Only as a starting point. A fair comparison also includes warm capacity, utilization, shared infrastructure, and operations work.

Sources

AI inference cost quiz

Get an AI compute cost read

Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.

Uses actual request volume, latency, GPU need, data movement, priority, and ops tolerance.

Start the AI compute read