Decision library

Decision pages for messy infrastructure questions.

Start here when the question is specific: NAT Gateway bill shock, H100 quote review, GPU idle cost, cloud exit cost, managed inference, or API versus self-hosted inference. Each page gives a direct answer, rough math, red flags, sources, and a handoff into the RunPlacement quiz.

Specific problemEach page is built around one concrete decision.
Fast answerThe short answer and decision rule appear before background context.
Next stepEach page points to the matching worksheet, framework, related decision, or quiz.

AI inference cost

AI inference cost decisions

API, managed inference, self-hosted GPU, batch, realtime, and hybrid serving cost decisions.

AI inference costAPI vs Self-Hosted Inference: Which Costs Less?Commercial comparison

API inference usually wins for uncertain or low-volume workloads; self-hosted inference can win when volume, utilization, latency, or control needs justify GPU operations.

AI inference costBatch vs Realtime Inference Cost: How to ChooseCost estimation

Batch inference is often cheaper when latency is flexible because work can be queued for higher utilization; realtime inference costs more when warm capacity and strict latency are required.

AI inference costManaged Inference vs GPU Cloud: Cost and Control TradeoffsCommercial comparison

Managed inference can cost more on paper but win when autoscaling, batching, reliability, and lower ops burden reduce effective inference cost.

AI inference costSelf-Hosted LLM Inference Cost: What to IncludeCost estimation

The GPU hourly rate is only the starting point for self-hosted LLM inference cost; warm capacity, utilization, storage, networking, monitoring, reliability, upgrades, and team time all belong in the estimate.

AI inference costLLM API Bill Too High? What to Check FirstCost triage

A high LLM API bill is usually a triage problem first: check whether output size, retries, tool calls, caching gaps, routing, or batchable work are driving the increase.

AI inference costInference Cost Per Request: Simple FormulaFormula

A useful inference cost per request starts with total monthly serving cost divided by successful inference requests, with failed calls and retries handled explicitly.

AI inference costGPU Utilization for Inference: Why Useful Hours MatterCost explanation

GPU utilization matters for inference because paid warm capacity can sit idle between requests, peaks, batches, deploys, or failures.

AI inference costSelf-Hosted Inference Break-Even: Directional FrameworkBreak-even framework

Self-hosted inference reaches break-even only when optimized API or managed cost is higher than fully loaded GPU serving cost at realistic utilization.

AI inference costBatch Inference Cost Savings: When Queueing HelpsCost optimization

Batch inference can reduce cost when the work can wait, queueing raises utilization, and the system avoids always-warm realtime capacity.

AI inference costAI Cost Comparison: API, Managed Inference, GPU Cloud, and BatchCommercial comparison

A useful AI cost comparison compares serving categories by monthly cost, cost per successful request, latency, utilization, and operations burden, not by provider ranking.

AI inference costAI Cost Per Token: When Token Price Helps and When It MisleadsFormula guide

AI cost per token is useful for API estimates, but it can mislead when output length, retries, multi-step workflows, failed calls, or fixed serving capacity dominate cost.

AI inference costAI Costs Increasing? A Triage Checklist Before You MigrateCost triage

When AI costs increase, first separate normal usage growth from waste: longer outputs, retries, failed calls, tool loops, poor routing, missing caching, and always-warm capacity.

AI inference costAI Cost Optimization: Practical Levers Before Rebuilding InferenceOptimization guide

AI cost optimization usually starts with usage shape: reduce avoidable output, retries, failed calls, over-large prompts, expensive routing, and low utilization before changing infrastructure.

GPU pricing

GPU pricing decisions

Quote review, useful GPU-hours, data movement, utilization, and provider tradeoffs.

AWS bill shock

AWS bill shock decisions

Line-item triage before assuming the whole cloud placement is wrong.

Cloud migration

Cloud migration decisions

Exit costs, payback windows, portability, and partial move decisions.

Workload placement

Workload placement decisions

The baseline worksheet for choosing a placement category before comparing vendors.

Resources

Reusable assets

The checklist pages support the decision pages and give people something practical to share.

GPU pricingGPU Cloud Quote ChecklistChecklist / 7 sections / source-linked

A practical checklist and visual worksheet for comparing GPU cloud quotes beyond the advertised hourly rate.

AWS bill shockAWS Bill Shock Triage ChecklistChecklist / 7 sections / source-linked

A first-pass checklist and visual triage flow for finding the AWS line items that usually make a bill jump.

AWS bill shockAWS Bill Shock Evidence ChecklistResearch checklist / 4 sections / source-linked

A source-backed checklist for collecting AWS Cost Explorer, NAT Gateway, transfer, CloudWatch, storage, and routing evidence before changing architecture.

Cloud migrationCloud Exit Cost ChecklistChecklist / 7 sections / source-linked

A checklist and payback worksheet for pricing the real cost of leaving AWS, GCP, or Azure before migration starts.

Cloud migrationCloud Exit Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the assumptions to collect before estimating cloud exit payback, partial migration, or workload re-placement.

Workload placementWorkload Placement WorksheetChecklist / 7 sections / source-linked

A practical worksheet and decision map for deciding where a workload should run before provider choice hardens.

Workload placementWorkload Placement Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the assumptions to collect before choosing cloud, GPU cloud, bare metal, managed platform, or hybrid placement.

AI inference costAI Inference Cost ChecklistChecklist / 8 sections / source-linked

A practical checklist for estimating AI inference cost across APIs, managed inference, self-hosted GPUs, batch jobs, realtime endpoints, and hybrid routing.

AI inference costAI Inference Cost Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the workload assumptions to collect before estimating API, managed inference, batch, GPU cloud, or self-hosted GPU cost.

AI inference costProvider Pricing Page Field AuditResearch audit / 4 sections / source-linked

A provider-neutral audit of the fields to verify on official pricing and deployment pages before comparing AI inference serving options.

AI inference costRealtime vs Batch Inference Cost Research GuideResearch guide / 7 sections / source-linked

A source-backed guide to deciding when realtime, asynchronous, batch, or hybrid inference changes effective AI serving cost.