GPU pricing

GPU Cloud Quote Checklist

Short answer: Use this when an H100, A100, or L40S quote looks cheap but the full workload cost is still unclear.

Estimate only

RunPlacement quiz

Pressure-test this workload

Compare useful GPU-hours, data movement, storage, support, capacity risk, and ops burden before choosing the lowest hourly rate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

GPU quote anatomy

A GPU quote is worth comparing only after each layer is visible.

01 GPU rate

Hourly price, GPU model, memory, and minimum rental window.

02 Useful runtime

Expected hours, utilization, failed jobs, retries, and idle buffer.

03 Data gravity

Storage, ingress, egress, region movement, and dataset staging.

04 Operating burden

Provisioning, monitoring, support, SLA, queues, and incident handling.

Do not compare GPU providers on hourly rate alone.
Ask what has to be paid, moved, stored, retried, reserved, and operated for the workload to finish.
The cheapest listed H100 can be the wrong placement if capacity is unreliable or data movement dominates the job.

GPU model, memory, interconnect, and number of GPUs per node.
Minimum rental window, reservation length, commitment, or deposit.
Storage included versus billed separately, including snapshots and persistent volumes.
Ingress, egress, inter-region, and private network transfer charges.
Idle capacity, queue time, failed job, and retry assumptions.
Support level, SLA, managed Kubernetes or inference service fees.
Whether pricing changes for spot, reserved, marketplace, or dedicated capacity.

Sticker GPU rate: useful for screening, weak for final decisions.
Useful GPU-hour: better for training and batch jobs because utilization is visible.
Total job cost: best for one-off training runs because retries and data movement are included.
Monthly serving cost: best for inference because idle baseline and traffic variance matter.
Ops-adjusted cost: best when the team has limited infrastructure tolerance.

Estimated job cost = GPU hourly rate x GPU count x runtime hours + storage + transfer + managed fees + idle/retry allowance.
Estimated inference month = baseline GPU hours + burst GPU hours + storage + transfer + observability + support.
Utilization-adjusted GPU rate = listed hourly rate / expected utilization.
Label every number as provider quote, observed bill, benchmark result, or estimate.

Use the RunPlacement quiz after collecting at least one real quote or bill line.
The quiz is most useful when you can state workload type, GPU need, data movement, priority, budget band, and ops tolerance.

RunPlacement quiz

Compare useful GPU-hours, data movement, storage, support, capacity risk, and ops burden before choosing the lowest hourly rate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.