Decision library

Decision pages for messy infrastructure questions.

Start here when the question is specific: NAT Gateway bill shock, H100 quote review, GPU idle cost, cloud exit cost, managed inference, or API versus self-hosted inference. Each page gives a direct answer, rough math, red flags, sources, and a handoff into the RunPlacement quiz.

Specific problemEach page is built around one concrete decision.

Fast answerThe short answer and decision rule appear before background context.

Next stepEach page points to the matching worksheet, framework, related decision, or quiz.

AI inference cost

AI inference cost decisions

API, managed inference, self-hosted GPU, batch, realtime, and hybrid serving cost decisions.

AI inference costAPI vs Self-Hosted Inference: Which Costs Less?Commercial comparison

API inference usually wins for uncertain or low-volume workloads; self-hosted inference can win when volume, utilization, latency, or control needs justify GPU operations.

AI inference costBatch vs Realtime Inference Cost: How to ChooseCost estimation

Batch inference is often cheaper when latency is flexible because work can be queued for higher utilization; realtime inference costs more when warm capacity and strict latency are required.

AI inference costManaged Inference vs GPU Cloud: Cost and Control TradeoffsCommercial comparison

Managed inference can cost more on paper but win when autoscaling, batching, reliability, and lower ops burden reduce effective inference cost.

AI inference costSelf-Hosted LLM Inference Cost: What to IncludeCost estimation

The GPU hourly rate is only the starting point for self-hosted LLM inference cost; warm capacity, utilization, storage, networking, monitoring, reliability, upgrades, and team time all belong in the estimate.

AI inference costLLM API Bill Too High? What to Check FirstCost triage

A high LLM API bill is usually a triage problem first: check whether output size, retries, tool calls, caching gaps, routing, or batchable work are driving the increase.

AI inference costInference Cost Per Request: Simple FormulaFormula

A useful inference cost per request starts with total monthly serving cost divided by successful inference requests, with failed calls and retries handled explicitly.

AI inference costGPU Utilization for Inference: Why Useful Hours MatterCost explanation

GPU utilization matters for inference because paid warm capacity can sit idle between requests, peaks, batches, deploys, or failures.

AI inference costSelf-Hosted Inference Break-Even: Directional FrameworkBreak-even framework

Self-hosted inference reaches break-even only when optimized API or managed cost is higher than fully loaded GPU serving cost at realistic utilization.

AI inference costBatch Inference Cost Savings: When Queueing HelpsCost optimization

Batch inference can reduce cost when the work can wait, queueing raises utilization, and the system avoids always-warm realtime capacity.

AI inference costAI Cost Comparison: API, Managed Inference, GPU Cloud, and BatchCommercial comparison

A useful AI cost comparison compares serving categories by monthly cost, cost per successful request, latency, utilization, and operations burden, not by provider ranking.

AI inference costAI Cost Per Token: When Token Price Helps and When It MisleadsFormula guide

AI cost per token is useful for API estimates, but it can mislead when output length, retries, multi-step workflows, failed calls, or fixed serving capacity dominate cost.

AI inference costAI Costs Increasing? A Triage Checklist Before You MigrateCost triage

When AI costs increase, first separate normal usage growth from waste: longer outputs, retries, failed calls, tool loops, poor routing, missing caching, and always-warm capacity.

AI inference costAI Cost Optimization: Practical Levers Before Rebuilding InferenceOptimization guide

AI cost optimization usually starts with usage shape: reduce avoidable output, retries, failed calls, over-large prompts, expensive routing, and low utilization before changing infrastructure.

GPU pricing

GPU pricing decisions

Quote review, useful GPU-hours, data movement, utilization, and provider tradeoffs.

GPU pricingH100 Quote Checklist: What to Ask Before Choosing GPU CloudCommercial investigation

An H100 quote is worth comparing only after the provider exposes the GPU shape, minimum rental window, storage, data transfer, capacity model, retry risk, and support terms.

GPU pricingGPU Cloud Idle Cost: How to Price Wasted Accelerator TimeCost estimation

GPU cloud idle cost is the gap between paid accelerator time and useful workload progress. It matters most for training retries, batch queues, and inference fleets with low baseline utilization.

GPU pricingRunPod vs Lambda GPU Cloud: How to Compare the FitProvider comparison

RunPod vs Lambda is less about one universal winner and more about workload fit. Compare GPU availability, storage behavior, operational model, support needs, and total job cost for your actual workload.

GPU pricingCoreWeave vs AWS GPU Cloud: When Specialized GPU Cloud FitsProvider comparison

CoreWeave vs AWS is a category decision first. Specialized GPU cloud can fit GPU-heavy work, while AWS can fit teams that need broader cloud services, existing controls, or tighter integration with current infrastructure.

AWS bill shock

AWS bill shock decisions

Line-item triage before assuming the whole cloud placement is wrong.

AWS bill shockAWS NAT Gateway Bill Shock: What to Check FirstProblem diagnosis

NAT Gateway bill shock usually means private subnet traffic is taking an expensive path. Start by finding which workload, route table, availability zone, or transfer pattern created the processed-data spike.

AWS bill shockAWS Pricing Calculator Alternative: What to Use for Placement DecisionsTool evaluation

For placement decisions, an AWS pricing calculator is useful but incomplete. You also need workload shape, hidden bill drivers, migration cost, operational tolerance, and whether the problem is AWS itself or one expensive line item.

AWS bill shockCloud Cost Tools for Startups: What to Use Before Hiring FinOpsCommercial investigation

Startups usually need three layers: native billing visibility, lightweight alerting or cleanup, and a decision worksheet for workload placement when the bill changes the infrastructure strategy.

Cloud migration

Cloud migration decisions

Exit costs, payback windows, portability, and partial move decisions.

Cloud migrationCloud Egress and Exit Cost: What to Price Before MovingMigration planning

Cloud egress is only one part of exit cost. A serious migration estimate also prices data export, recurring transfer, storage retrieval, rewrites, testing, downtime, rollback, and new operations.

Cloud migrationBare Metal vs Cloud Break-Even: When Dedicated Servers WinCommercial comparison

Bare metal can win when a workload is steady, portable, highly utilized, and operationally owned. Cloud usually wins when flexibility, managed services, or variable demand matter more than unit cost.

Workload placement

Workload placement decisions

The baseline worksheet for choosing a placement category before comparing vendors.

Workload placementManaged Platform vs Cloud: When Less Control Is the Better PlacementCommercial comparison

A managed platform can be the better placement when engineering focus and reliability matter more than infrastructure control. Direct cloud can be better when the team needs flexibility, deep customization, or lower unit cost at scale.

Resources

Reusable assets

The checklist pages support the decision pages and give people something practical to share.

GPU pricingGPU Cloud Quote ChecklistChecklist / 7 sections / source-linked

A practical checklist and visual worksheet for comparing GPU cloud quotes beyond the advertised hourly rate.

AWS bill shockAWS Bill Shock Triage ChecklistChecklist / 7 sections / source-linked

A first-pass checklist and visual triage flow for finding the AWS line items that usually make a bill jump.

AWS bill shockAWS Bill Shock Evidence ChecklistResearch checklist / 4 sections / source-linked

A source-backed checklist for collecting AWS Cost Explorer, NAT Gateway, transfer, CloudWatch, storage, and routing evidence before changing architecture.

Cloud migrationCloud Exit Cost ChecklistChecklist / 7 sections / source-linked

A checklist and payback worksheet for pricing the real cost of leaving AWS, GCP, or Azure before migration starts.

Cloud migrationCloud Exit Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the assumptions to collect before estimating cloud exit payback, partial migration, or workload re-placement.

Workload placementWorkload Placement WorksheetChecklist / 7 sections / source-linked

A practical worksheet and decision map for deciding where a workload should run before provider choice hardens.

Workload placementWorkload Placement Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the assumptions to collect before choosing cloud, GPU cloud, bare metal, managed platform, or hybrid placement.

AI inference costAI Inference Cost ChecklistChecklist / 8 sections / source-linked

A practical checklist for estimating AI inference cost across APIs, managed inference, self-hosted GPUs, batch jobs, realtime endpoints, and hybrid routing.

AI inference costAI Inference Cost Assumptions IndexResearch index / 4 sections / source-linked

A source-backed index of the workload assumptions to collect before estimating API, managed inference, batch, GPU cloud, or self-hosted GPU cost.

AI inference costProvider Pricing Page Field AuditResearch audit / 4 sections / source-linked

A provider-neutral audit of the fields to verify on official pricing and deployment pages before comparing AI inference serving options.

AI inference costRealtime vs Batch Inference Cost Research GuideResearch guide / 7 sections / source-linked

A source-backed guide to deciding when realtime, asynchronous, batch, or hybrid inference changes effective AI serving cost.