Decision library
Workload placement research, grouped by the decision that triggered it.
RunPlacement pages are built for quick scanning: short answers, rough math, tradeoffs, sources, and a decision rule. Start with the confusion closest to the workload.
Decision pages
Start with the durable decision library.
Admin-approved drafts can still publish here, but the main problem pages live at the decision index.
An H100 quote is worth comparing only after the provider exposes the GPU shape, minimum rental window, storage, data transfer, capacity model, retry risk, and support terms.
AWS bill shockAWS NAT Gateway Bill Shock: What to Check FirstProblem diagnosisNAT Gateway bill shock usually means private subnet traffic is taking an expensive path. Start by finding which workload, route table, availability zone, or transfer pattern created the processed-data spike.
GPU pricingGPU Cloud Idle Cost: How to Price Wasted Accelerator TimeCost estimationGPU cloud idle cost is the gap between paid accelerator time and useful workload progress. It matters most for training retries, batch queues, and inference fleets with low baseline utilization.
Cloud migrationCloud Egress and Exit Cost: What to Price Before MovingMigration planningCloud egress is only one part of exit cost. A serious migration estimate also prices data export, recurring transfer, storage retrieval, rewrites, testing, downtime, rollback, and new operations.
Cloud migrationBare Metal vs Cloud Break-Even: When Dedicated Servers WinCommercial comparisonBare metal can win when a workload is steady, portable, highly utilized, and operationally owned. Cloud usually wins when flexibility, managed services, or variable demand matter more than unit cost.
GPU pricingRunPod vs Lambda GPU Cloud: How to Compare the FitProvider comparisonRunPod vs Lambda is less about one universal winner and more about workload fit. Compare GPU availability, storage behavior, operational model, support needs, and total job cost for your actual workload.
AI inference cost
AI inference cost
API, managed inference, self-hosted GPU, batch, realtime, and hybrid serving decisions.
A practical breakdown of GPU inference cost drivers, including useful GPU-hours, batching, idle time, traffic shape, storage, and data movement.
AI inference costRunPod vs Lambda vs AWS: Which Fits GPU Inference?comparisonCompare RunPod, Lambda, and AWS for GPU inference by cost sensitivity, data gravity, reliability, operations, and production requirements.
AI inference costA100 vs H100: When the Cheaper GPU Is the Better PlacementdecisionA practical decision page for choosing A100 or H100 based on workload shape, memory, throughput, price, and availability.
AI inference costH100 On-Demand vs Reserved Capacity vs Spot: Which Should You Use?decisionA decision page for choosing between on-demand H100, reserved GPU capacity, and spot or marketplace GPUs.
AI inference costAWS vs Specialized GPU Cloud for H100 InferencecomparisonA practical decision page for comparing AWS H100 capacity against specialized GPU clouds for inference workloads.
AWS bill shock
AWS bill shock
Start here when the bill jumped and the expensive line item is not obvious.
A practical decision page for deciding when a simple workload should stay on AWS or move to a smaller cloud, managed platform, or bare metal.
AWS bill shockS3 Cost Surprise: Storage Is Only Part Of The AWS Billcost_breakdownA practical S3 cost breakdown covering storage, requests, retrieval, replication, lifecycle rules, and data transfer surprises.
AWS bill shockCloudWatch Cost Surprise: Logs, Metrics, And The Observability Taxcost_breakdownA practical AWS CloudWatch cost breakdown for logs, metrics, retention, dashboards, and workload observability tradeoffs.
AWS bill shockAWS Data Transfer Cost Confusion: Egress, Cross-AZ, And Region Mistakescost_breakdownA practical page for understanding AWS data transfer cost surprises across egress, cross-AZ traffic, regions, and workload placement.
AWS bill shockAWS NAT Gateway Surprise Bills: When Private Subnet Traffic Gets Expensivecost_breakdownA practical decision page for understanding AWS NAT Gateway cost surprises and when architecture, endpoints, or placement may need review.
AWS bill shockWhy Is My AWS Bill So High? The Usual Places To Look Firstcost_breakdownA practical AWS bill shock checklist for finding common cost drivers before moving workloads or blaming EC2.
Provider comparisons
Provider comparisons
When the hard part is choosing between provider categories, not reading another pricing page.
A practical checklist for comparing cloud GPU quotes across hourly rate, billing unit, storage, bandwidth, availability, support, and commitments.
Provider comparisonsCheapest H100 Cloud: Why The Lowest Price Can Be The Wrong AnswerdecisionA practical decision page explaining why the cheapest H100 cloud listing may not be the cheapest workload placement.
Provider comparisonsVast.ai vs Managed GPU Cloud: When Marketplace Pricing Is Worth ItcomparisonA practical decision page for comparing Vast.ai marketplace GPU pricing with managed GPU clouds for experiments, inference, and training.
Provider comparisonsH100 Cloud Pricing Comparison: What To Compare Before The Hourly RatecomparisonA practical H100 cloud pricing comparison checklist focused on useful GPU-hours, availability, storage, bandwidth, and operational tradeoffs.
Provider comparisonsRunPod vs Lambda vs Vast.ai: Which GPU Cloud Fits Your Workload?comparisonCompare RunPod, Lambda, and Vast.ai by workload shape, reliability needs, pricing model, and operational tolerance.
Provider comparisonsHow to Systematically Compare Cloud GPU Prices Across 20+ ProviderscomparisonThe real approach to comparing GPU prices on AWS, Google, Oracle, and 20+ providers—when spot/on-demand, regions, and volatility can drive 2x–8x price swings monthly. Shortcuts, tradeoffs, and decision tools.
Capacity decisions
Capacity decisions
Commitment, reservation, on-demand, and utilization tradeoffs.
A practical GPU utilization break-even page for deciding when lower hourly rates outweigh idle time, retries, and operational overhead.
Capacity decisionsGPU Training Cost Breakdown: Before You Rent The Biggest GPUcost_breakdownA practical breakdown of GPU training cost drivers, including runtime, checkpointing, failed runs, storage, data movement, and capacity planning.
Cost breakdowns
Cost breakdowns
Line-item checklists for finding what the hourly rate leaves out.
A checklist of GPU cloud costs that are easy to miss, including storage, bandwidth, idle time, retries, support, and commitment waste.
Cost breakdownsGPU Cloud Pricing Checklist: What the Hourly Rate Leaves Outcost_breakdownA checklist for comparing GPU cloud quotes beyond the hourly GPU price, including storage, bandwidth, idle time, availability, and ops.
Cloud migration
Cloud migration
AWS exit and workload portability decisions.
A practical checklist for cloud exit costs, including data transfer, rewrites, managed service replacement, downtime risk, and operations.
Cloud migrationAWS vs Bare Metal: When Owning The Machine Makes SensecomparisonA practical comparison of AWS and bare metal for steady workloads, predictable utilization, operations, and cost control.
Cloud migrationWhen Not To Leave AWS Even If The Bill Looks HighdecisionA practical decision page for knowing when a high AWS bill should be optimized inside AWS instead of triggering a migration.
Cloud migrationShould You Move From AWS To A Cheaper Cloud?decisionA practical decision page for deciding whether AWS savings justify migration work, data movement, and operational risk.
Resources
Linkable assets
Checklists and worksheets built for practical sharing, not promotional posting.
A practical checklist and visual worksheet for comparing GPU cloud quotes beyond the advertised hourly rate.
AWS bill shockAWS Bill Shock Triage ChecklistChecklist / 7 sections / source-linkedA first-pass checklist and visual triage flow for finding the AWS line items that usually make a bill jump.
Cloud migrationCloud Exit Cost ChecklistChecklist / 7 sections / source-linkedA checklist and payback worksheet for pricing the real cost of leaving AWS, GCP, or Azure before migration starts.
Workload placementWorkload Placement WorksheetChecklist / 7 sections / source-linkedA practical worksheet and decision map for deciding where a workload should run before provider choice hardens.
AI inference costAI Inference Cost ChecklistChecklist / 8 sections / source-linkedA practical checklist for estimating AI inference cost across APIs, managed inference, self-hosted GPUs, batch jobs, realtime endpoints, and hybrid routing.