AI inference cost / Formula guide

AI Cost Per Token: When Token Price Helps and When It Misleads

Short answer: AI cost per token is useful for API estimates, but it can mislead when output length, retries, multi-step workflows, failed calls, or fixed serving capacity dominate cost.

Decision rule
  • Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.
  • Verify current provider pricing directly before buying or migrating.

Next action

Connect token price to product cost

Token price matters most after output length, retries, failed calls, traffic mix, and shared serving overhead are included.

Compare monthly cost
By Andrew Cooper, Founder of RunPlacement Updated May 2026 Provider-neutral, estimate-labeled guidance Verify current provider pricing

Right fit

  • You are estimating API spend from prompt and output size.
  • A teammate is using token price as the whole cost comparison.
  • You need to connect token usage to product margin or monthly budget.

Quick checks

  • Estimate input and output tokens separately.
  • Include retries, tool calls, multi-step chains, and failed calls.
  • Measure successful requests rather than all attempts.
  • Check whether fixed managed or GPU capacity changes the denominator.

Rough math

  • API estimate = input tokens / 1,000,000 * input price + output tokens / 1,000,000 * output price.
  • Request-level cost = expected input cost + expected output cost + retry and workflow allowance.
  • Effective serving cost = total monthly serving cost / successful requests.

Red flags

  • Only input tokens are counted.
  • The estimate ignores long outputs.
  • Failed attempts are treated as if they created user value.
  • Token price is used to dismiss warm GPU, storage, network, or ops cost.

What to do next

  • Use the calculator with your current token prices and request volume.
  • Use inference cost per request to normalize the result.
  • Use AI cost optimization if output size, retries, or routing are the main drivers.

Related resources

Use a worksheet before making the call

These supporting pages turn the decision into fields a buyer, engineer, or founder can actually compare.

Related decisions

Keep narrowing the placement question

Follow the adjacent pages when the first answer exposes a deeper cost driver or operating constraint.

Framework

Use the underlying decision model

These framework pages define the terms and formulas behind this specific decision.

AI inference cost quiz

Get an AI compute cost read

Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.

Uses actual request volume, latency, GPU need, data movement, priority, and ops tolerance.
Start the AI compute read

FAQ

Is cost per token the same as cost per request?

No. Cost per request depends on the number of input tokens, output tokens, retries, failed calls, and workflow steps used by each successful request.

Why can output tokens dominate AI API cost?

Output tokens often have a separate price and can grow with verbose responses, chain-of-thought-like drafts, tool summaries, or unbounded generation.

Can token price compare API and self-hosted GPU?

Only as a starting point. A fair comparison also includes warm capacity, utilization, shared infrastructure, and operations work.

Sources

AI inference cost quiz

Get an AI compute cost read

Use token price for API usage math, then convert the estimate into cost per successful request and monthly serving cost before comparing serving modes.

Uses actual request volume, latency, GPU need, data movement, priority, and ops tolerance.
Start the AI compute read