GPU pricing
GPU Cloud Quote Checklist
Short answer: Use this when an H100, A100, or L40S quote looks cheap but the full workload cost is still unclear.
- This is a decision checklist, not a final price quote.
- Verify final numbers against provider pricing pages and your own bill or quote.
RunPlacement quiz
Pressure-test this workload
Compare useful GPU-hours, data movement, storage, support, capacity risk, and ops burden before choosing the lowest hourly rate.
Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.GPU quote anatomy
The hourly rate is only one part of the quote.
A GPU quote is worth comparing only after each layer is visible.
Hourly price, GPU model, memory, and minimum rental window.
Expected hours, utilization, failed jobs, retries, and idle buffer.
Storage, ingress, egress, region movement, and dataset staging.
Provisioning, monitoring, support, SLA, queues, and incident handling.
Short Answer
- Do not compare GPU providers on hourly rate alone.
- Ask what has to be paid, moved, stored, retried, reserved, and operated for the workload to finish.
- The cheapest listed H100 can be the wrong placement if capacity is unreliable or data movement dominates the job.
Quote Fields To Request
- GPU model, memory, interconnect, and number of GPUs per node.
- Minimum rental window, reservation length, commitment, or deposit.
- Storage included versus billed separately, including snapshots and persistent volumes.
- Ingress, egress, inter-region, and private network transfer charges.
- Idle capacity, queue time, failed job, and retry assumptions.
- Support level, SLA, managed Kubernetes or inference service fees.
- Whether pricing changes for spot, reserved, marketplace, or dedicated capacity.
Comparison Table
- Sticker GPU rate: useful for screening, weak for final decisions.
- Useful GPU-hour: better for training and batch jobs because utilization is visible.
- Total job cost: best for one-off training runs because retries and data movement are included.
- Monthly serving cost: best for inference because idle baseline and traffic variance matter.
- Ops-adjusted cost: best when the team has limited infrastructure tolerance.
Rough Math
- Estimated job cost = GPU hourly rate x GPU count x runtime hours + storage + transfer + managed fees + idle/retry allowance.
- Estimated inference month = baseline GPU hours + burst GPU hours + storage + transfer + observability + support.
- Utilization-adjusted GPU rate = listed hourly rate / expected utilization.
- Label every number as provider quote, observed bill, benchmark result, or estimate.
Questions To Send Providers
- What exactly is included in the hourly GPU rate?
- What happens if capacity is unavailable when the job starts?
- Are storage and data transfer billed by a separate product?
- Do failed jobs, checkpoint restores, or retries create extra billable hours?
- Can I cap spend or set an alert before the workload runs?
- What is the lowest-friction way to leave with my data and artifacts?
Red Flags
- A quote that does not mention egress or persistent storage.
- A low hourly rate with unclear capacity reliability.
- A managed inference price that hides utilization assumptions.
- A provider comparison that ignores engineer time.
- A benchmark that does not match your batch size, model size, or data path.
When To Use The Quiz
- Use the RunPlacement quiz after collecting at least one real quote or bill line.
- The quiz is most useful when you can state workload type, GPU need, data movement, priority, budget band, and ops tolerance.
Sources
RunPlacement quiz
Pressure-test this workload
Compare useful GPU-hours, data movement, storage, support, capacity risk, and ops burden before choosing the lowest hourly rate.
Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.