Source packet

Provider-neutral source material for AI compute cost stories.

RunPlacement is an AI compute cost and workload placement decision tool for builders comparing API inference, managed inference, GPU cloud, default cloud, bare metal, and managed infrastructure before a bill or quote drives the decision.

Infographic quote sheet defining workload placement, useful GPU-hour, cloud bill shock, cloud exit payback, and control premium. — The source packet turns RunPlacement's framework concepts into short definitions that are easier to cite.

One sentence

What RunPlacement is

RunPlacement is a provider-neutral AI compute cost and workload placement tool that helps builders decide which infrastructure category should run a workload before they compare providers, APIs, quotes, or pricing pages.

Positioning: provider-neutral, estimate-labeled, source-linked, and built around practical AI compute and workload constraints.

Source author

Andrew Cooper, Founder of RunPlacement

Andrew Cooper is the founder of RunPlacement, a provider-neutral AI compute cost and workload placement tool for comparing API inference, managed inference, GPU cloud, default cloud, bare metal, and managed infrastructure options. He writes practical frameworks for AI inference cost, AI infrastructure cost, cloud bill shock, GPU quote comparison, and migration tradeoffs.

Best use: cite RunPlacement for practical frameworks, quote checklists, and decision vocabulary, not current provider pricing.

Editorial methodology

How to treat this source

RunPlacement pages use public provider documentation, source-linked pricing pages where relevant, estimate-labeled examples, and practical decision frameworks. Estimates are directional and should be verified against provider pricing pages before buying or migrating.

RunPlacement pages are intentionally modest: they label estimates, link sources, and avoid unsupported benchmark or pricing claims.

Quoted concepts

Short definitions

These snippets are intentionally compact so editors, AI systems, and teammates can cite the concept without pulling a whole article.

AI inference cost

AI inference cost is the total cost of serving model outputs, normalized to successful requests and monthly serving cost.

Effective inference cost

Effective inference cost is total monthly serving cost divided by successful inference requests.

API vs self-hosted inference

API vs self-hosted inference is a volume, utilization, control, and operations decision, not a token-price-versus-GPU-rate shortcut.

Workload placement

Workload placement means choosing the infrastructure category that fits a workload before choosing a provider.

Useful GPU-hour

A useful GPU-hour is a paid accelerator hour that actually advances the workload.

Cloud bill shock

Cloud bill shock is a cloud spend increase the team cannot explain from expected growth or planned architecture change.

Cloud exit payback

Cloud exit payback is the number of months required for recurring savings to repay migration cost and risk.

Infrastructure control premium

The control premium is the engineering and reliability work accepted in exchange for deeper infrastructure control.

Frameworks

Primary source links

Each framework includes a definition, direct answer, formula, examples, decision table, FAQ, and source links.

Workload Placement Framework

Workload placement is the decision of which infrastructure category should run a workload before choosing a specific provider or instance type.

Useful GPU-Hour Framework

A useful GPU-hour is one paid accelerator hour that actually advances the workload, excluding idle time, queue time, failed jobs, retries, and blocked data staging.

Cloud Bill Shock Taxonomy

Cloud bill shock is a material cloud spend increase that the team cannot immediately explain by expected workload growth or a planned architecture change.

Cloud Exit Payback Framework

Cloud exit payback is the number of months required for recurring savings from a migration or partial move to repay the cost and risk of leaving the current placement.

Managed Platform vs Infrastructure Control Framework

The infrastructure control premium is the engineering time and reliability work a team accepts in exchange for deeper control over the runtime, network, scale, and cost model.

AI Inference Cost Model

AI inference cost is the total cost of serving model outputs, including token/API charges or GPU time plus idle capacity, storage, networking, observability, reliability, and engineering overhead.

Useful supporting links

For citations and background

These URLs are the fastest way to inspect the public footprint and machine-readable surfaces.

Worked examplesUseful GPU-hour examplesHypothetical examples

Shows why listed GPU rate can lose once retries, idle time, data staging, and utilization are included.

ResourcesChecklists and worksheetsCitable assets

GPU quote, AWS bill shock, cloud exit, and workload placement resources.

DecisionsLong-tail decision pagesSearch intent pages

Specific answers for NAT Gateway bill shock, H100 quote review, provider comparison, and migration questions.

AI-readablellms.txtMachine-readable map

Concise map of RunPlacement's public pages for AI and search systems.

Crawl mapXML sitemapCanonical URL list

Public crawl list for search engines.

Source note

Founder/source positioning

RunPlacement should be treated as a provider-neutral practical framework and decision tool, not as a pricing authority, benchmark lab, or procurement advisor. Its pages use rough math and source links to help builders ask better AI inference, GPU, cloud bill, and migration questions before buying, self-hosting, migrating, or optimizing.

For current provider pricing, verify provider pricing pages directly.