Decision library
Decision pages for messy infrastructure questions.
Start here when the question is specific: NAT Gateway bill shock, H100 quote review, GPU idle cost, cloud exit cost, managed inference, or API versus self-hosted inference. Each page gives a direct answer, rough math, red flags, sources, and a handoff into the RunPlacement quiz.
AI inference cost
AI inference cost decisions
API, managed inference, self-hosted GPU, batch, realtime, and hybrid serving cost decisions.
API inference usually wins for uncertain or low-volume workloads; self-hosted inference can win when volume, utilization, latency, or control needs justify GPU operations.
AI inference costBatch vs Realtime Inference Cost: How to ChooseCost estimationBatch inference is often cheaper when latency is flexible because work can be queued for higher utilization; realtime inference costs more when warm capacity and strict latency are required.
AI inference costManaged Inference vs GPU Cloud: Cost and Control TradeoffsCommercial comparisonManaged inference can cost more on paper but win when autoscaling, batching, reliability, and lower ops burden reduce effective inference cost.
AI inference costSelf-Hosted LLM Inference Cost: What to IncludeCost estimationThe GPU hourly rate is only the starting point for self-hosted LLM inference cost; warm capacity, utilization, storage, networking, monitoring, reliability, upgrades, and team time all belong in the estimate.
AI inference costLLM API Bill Too High? What to Check FirstCost triageA high LLM API bill is usually a triage problem first: check whether output size, retries, tool calls, caching gaps, routing, or batchable work are driving the increase.
AI inference costInference Cost Per Request: Simple FormulaFormulaA useful inference cost per request starts with total monthly serving cost divided by successful inference requests, with failed calls and retries handled explicitly.
AI inference costGPU Utilization for Inference: Why Useful Hours MatterCost explanationGPU utilization matters for inference because paid warm capacity can sit idle between requests, peaks, batches, deploys, or failures.
AI inference costSelf-Hosted Inference Break-Even: Directional FrameworkBreak-even frameworkSelf-hosted inference reaches break-even only when optimized API or managed cost is higher than fully loaded GPU serving cost at realistic utilization.
AI inference costBatch Inference Cost Savings: When Queueing HelpsCost optimizationBatch inference can reduce cost when the work can wait, queueing raises utilization, and the system avoids always-warm realtime capacity.
AI inference costAI Cost Comparison: API, Managed Inference, GPU Cloud, and BatchCommercial comparisonA useful AI cost comparison compares serving categories by monthly cost, cost per successful request, latency, utilization, and operations burden, not by provider ranking.
AI inference costAI Cost Per Token: When Token Price Helps and When It MisleadsFormula guideAI cost per token is useful for API estimates, but it can mislead when output length, retries, multi-step workflows, failed calls, or fixed serving capacity dominate cost.
AI inference costAI Costs Increasing? A Triage Checklist Before You MigrateCost triageWhen AI costs increase, first separate normal usage growth from waste: longer outputs, retries, failed calls, tool loops, poor routing, missing caching, and always-warm capacity.
AI inference costAI Cost Optimization: Practical Levers Before Rebuilding InferenceOptimization guideAI cost optimization usually starts with usage shape: reduce avoidable output, retries, failed calls, over-large prompts, expensive routing, and low utilization before changing infrastructure.
GPU pricing
GPU pricing decisions
Quote review, useful GPU-hours, data movement, utilization, and provider tradeoffs.
An H100 quote is worth comparing only after the provider exposes the GPU shape, minimum rental window, storage, data transfer, capacity model, retry risk, and support terms.
GPU pricingGPU Cloud Idle Cost: How to Price Wasted Accelerator TimeCost estimationGPU cloud idle cost is the gap between paid accelerator time and useful workload progress. It matters most for training retries, batch queues, and inference fleets with low baseline utilization.
GPU pricingRunPod vs Lambda GPU Cloud: How to Compare the FitProvider comparisonRunPod vs Lambda is less about one universal winner and more about workload fit. Compare GPU availability, storage behavior, operational model, support needs, and total job cost for your actual workload.
GPU pricingCoreWeave vs AWS GPU Cloud: When Specialized GPU Cloud FitsProvider comparisonCoreWeave vs AWS is a category decision first. Specialized GPU cloud can fit GPU-heavy work, while AWS can fit teams that need broader cloud services, existing controls, or tighter integration with current infrastructure.
AWS bill shock
AWS bill shock decisions
Line-item triage before assuming the whole cloud placement is wrong.
NAT Gateway bill shock usually means private subnet traffic is taking an expensive path. Start by finding which workload, route table, availability zone, or transfer pattern created the processed-data spike.
AWS bill shockAWS Pricing Calculator Alternative: What to Use for Placement DecisionsTool evaluationFor placement decisions, an AWS pricing calculator is useful but incomplete. You also need workload shape, hidden bill drivers, migration cost, operational tolerance, and whether the problem is AWS itself or one expensive line item.
AWS bill shockCloud Cost Tools for Startups: What to Use Before Hiring FinOpsCommercial investigationStartups usually need three layers: native billing visibility, lightweight alerting or cleanup, and a decision worksheet for workload placement when the bill changes the infrastructure strategy.
Cloud migration
Cloud migration decisions
Exit costs, payback windows, portability, and partial move decisions.
Cloud egress is only one part of exit cost. A serious migration estimate also prices data export, recurring transfer, storage retrieval, rewrites, testing, downtime, rollback, and new operations.
Cloud migrationBare Metal vs Cloud Break-Even: When Dedicated Servers WinCommercial comparisonBare metal can win when a workload is steady, portable, highly utilized, and operationally owned. Cloud usually wins when flexibility, managed services, or variable demand matter more than unit cost.
Workload placement
Workload placement decisions
The baseline worksheet for choosing a placement category before comparing vendors.
Resources
Reusable assets
The checklist pages support the decision pages and give people something practical to share.
A practical checklist and visual worksheet for comparing GPU cloud quotes beyond the advertised hourly rate.
AWS bill shockAWS Bill Shock Triage ChecklistChecklist / 7 sections / source-linkedA first-pass checklist and visual triage flow for finding the AWS line items that usually make a bill jump.
AWS bill shockAWS Bill Shock Evidence ChecklistResearch checklist / 4 sections / source-linkedA source-backed checklist for collecting AWS Cost Explorer, NAT Gateway, transfer, CloudWatch, storage, and routing evidence before changing architecture.
Cloud migrationCloud Exit Cost ChecklistChecklist / 7 sections / source-linkedA checklist and payback worksheet for pricing the real cost of leaving AWS, GCP, or Azure before migration starts.
Cloud migrationCloud Exit Assumptions IndexResearch index / 4 sections / source-linkedA source-backed index of the assumptions to collect before estimating cloud exit payback, partial migration, or workload re-placement.
Workload placementWorkload Placement WorksheetChecklist / 7 sections / source-linkedA practical worksheet and decision map for deciding where a workload should run before provider choice hardens.
Workload placementWorkload Placement Assumptions IndexResearch index / 4 sections / source-linkedA source-backed index of the assumptions to collect before choosing cloud, GPU cloud, bare metal, managed platform, or hybrid placement.
AI inference costAI Inference Cost ChecklistChecklist / 8 sections / source-linkedA practical checklist for estimating AI inference cost across APIs, managed inference, self-hosted GPUs, batch jobs, realtime endpoints, and hybrid routing.
AI inference costAI Inference Cost Assumptions IndexResearch index / 4 sections / source-linkedA source-backed index of the workload assumptions to collect before estimating API, managed inference, batch, GPU cloud, or self-hosted GPU cost.
AI inference costProvider Pricing Page Field AuditResearch audit / 4 sections / source-linkedA provider-neutral audit of the fields to verify on official pricing and deployment pages before comparing AI inference serving options.
AI inference costRealtime vs Batch Inference Cost Research GuideResearch guide / 7 sections / source-linkedA source-backed guide to deciding when realtime, asynchronous, batch, or hybrid inference changes effective AI serving cost.