GPU pricing

GPU Cloud Quote Checklist

Short answer: Use this when an H100, A100, or L40S quote looks cheap but the full workload cost is still unclear.

Estimate only
  • This is a decision checklist, not a final price quote.
  • Verify final numbers against provider pricing pages and your own bill or quote.

First pass

Use This Before You Ask For Quotes

The fastest way to compare GPU providers is to make every quote answer the same questions.

1

Copy the provider email

Replace the bracketed workload note with training, inference, batch, experimentation, or your actual workload.

2

Ask for the hidden fields

Make storage, transfer, support, queue behavior, commitments, and failure handling explicit.

3

Compare useful GPU-hours

Normalize around completed work, not listed hourly GPU rate alone.

Filled example

Example: Incomplete Quote Check

Hypothetical quote review, not a provider ranking.

InputHypothetical value
Looks completeGPU model, node shape, region, and hourly rate are listed.
Still missingStorage, egress, queue behavior, retry handling, support tier, and cancellation terms.
Next moveSend the provider email block before treating the quote as comparable.

What it flags: A low hourly rate is not enough if the quote omits the costs and terms that determine useful GPU-hours.

By Andrew Cooper, Founder of RunPlacement Updated May 2026 Provider-neutral, estimate-labeled guidance Verify current provider pricing

Use this when

  • You have two or more GPU cloud quotes that are not directly comparable.
  • The provider headline rate looks cheap but storage, egress, support, or capacity is unclear.
  • A training, batch, or inference workload may spend meaningful time idle, queued, or retrying.

Not for

  • Final procurement approval without provider quotes.
  • Benchmarking model quality or throughput; this is a cost-decision worksheet.
  • Picking a provider from brand preference alone.
Infographic checklist of GPU quote fields: model, interconnect, storage, network, waste, support, and commitments.
A GPU quote is incomplete until storage, network, utilization, support, and commitment terms are visible.

GPU quote anatomy

The hourly rate is only one part of the quote.

A GPU quote is worth comparing only after each layer is visible.

01 GPU rate

Hourly price, GPU model, memory, and minimum rental window.

02 Useful runtime

Expected hours, utilization, failed jobs, retries, and idle buffer.

03 Data gravity

Storage, ingress, egress, region movement, and dataset staging.

04 Operating burden

Provisioning, monitoring, support, SLA, queues, and incident handling.

Worksheet Fields

Use this as the working version before copying the decision into a doc, ticket, or vendor email.

FieldCaptureWhy it matters
GPU shapeModel, memory, interconnect, GPUs per node, region, capacity type.Prevents false equivalence between unlike quotes.
Runtime realityExpected hours, utilization, queue time, failed jobs, retries, checkpoint restores.Turns sticker rate into useful GPU-hour cost.
Data pathDataset size, storage duration, snapshots, ingress, egress, cross-region movement.Finds the costs that often sit outside the GPU line.
Ops burdenProvisioning, monitoring, support, SLA, incident owner, exit path.Shows whether the cheaper quote creates work elsewhere.

Provider-ready

Copy Into A Provider Email

Use this as a neutral request for comparable GPU cloud quote details. Replace bracketed notes with your workload details before sending.

Subject: GPU cloud quote details for comparison

Hi [provider/team],

I am comparing GPU cloud options for [training / inference / batch / experimentation]. To make quotes comparable, could you please include:

- GPU model, memory, GPU count, node shape, interconnect, and region.
- Capacity type: on-demand, reserved, spot-like, dedicated, queued, or marketplace.
- Minimum rental window, reservation length, deposit, commitment, or cancellation terms.
- Storage included in the GPU rate versus billed separately, including persistent storage and snapshots.
- Ingress, egress, private transfer, cross-region, and cross-zone network costs.
- Queue behavior, capacity availability, failed job handling, retry behavior, and checkpoint restore assumptions.
- Support tier, SLA, incident path, and response expectations.
- Any managed Kubernetes, managed inference, observability, or platform fees.
- Spend caps, alerts, and the lowest-friction path to leave with data and artifacts.

Please label which numbers are current public pricing, quote-specific pricing, usage estimates, or contractual terms.

Thanks,
[name]

AI prompt

Prompt To Compare GPU Quotes

Paste provider replies into your AI tool with this prompt. The goal is to find missing fields and normalize around useful GPU-hours, not to rank providers by brand.

You are helping me compare GPU cloud quotes. Do not assume current provider pricing, capacity, or performance unless I provide it. Do not rank providers by brand.

Here are the quote details:
[Paste provider quote details here]

Please:
1. Identify missing quote fields across GPU shape, node shape, storage, data transfer, support, SLA, queue behavior, failure/retry behavior, and commitment terms.
2. Normalize the comparison around useful GPU-hours and total workload completion cost, not listed hourly GPU rate alone.
3. Separate public pricing, provider quote terms, workload assumptions, and unknowns.
4. Flag cost drivers that could make a cheaper listed rate more expensive in practice.
5. List follow-up questions to send each provider before choosing.
6. Avoid benchmark, provider-ranking, or current-pricing claims unless they are directly supplied in the quote.

Short Answer

  • Do not compare GPU providers on hourly rate alone.
  • Ask what has to be paid, moved, stored, retried, reserved, and operated for the workload to finish.
  • The cheapest listed H100 can be the wrong placement if capacity is unreliable or data movement dominates the job.

Quote Fields To Request

  • GPU model, memory, interconnect, and number of GPUs per node.
  • Minimum rental window, reservation length, commitment, or deposit.
  • Storage included versus billed separately, including snapshots and persistent volumes.
  • Ingress, egress, inter-region, and private network transfer charges.
  • Idle capacity, queue time, failed job, and retry assumptions.
  • Support level, SLA, managed Kubernetes or inference service fees.
  • Whether pricing changes for spot, reserved, marketplace, or dedicated capacity.

Comparison Table

  • Sticker GPU rate: useful for screening, weak for final decisions.
  • Useful GPU-hour: better for training and batch jobs because utilization is visible.
  • Total job cost: best for one-off training runs because retries and data movement are included.
  • Monthly serving cost: best for inference because idle baseline and traffic variance matter.
  • Ops-adjusted cost: best when the team has limited infrastructure tolerance.

Rough Math

  • Estimated job cost = GPU hourly rate x GPU count x runtime hours + storage + transfer + managed fees + idle/retry allowance.
  • Estimated inference month = baseline GPU hours + burst GPU hours + storage + transfer + observability + support.
  • Utilization-adjusted GPU rate = listed hourly rate / expected utilization.
  • Label every number as provider quote, observed bill, benchmark result, or estimate.

Questions To Send Providers

  • What exactly is included in the hourly GPU rate?
  • What happens if capacity is unavailable when the job starts?
  • Are storage and data transfer billed by a separate product?
  • Do failed jobs, checkpoint restores, or retries create extra billable hours?
  • Can I cap spend or set an alert before the workload runs?
  • What is the lowest-friction way to leave with my data and artifacts?

Red Flags

  • A quote that does not mention egress or persistent storage.
  • A low hourly rate with unclear capacity reliability.
  • A managed inference price that hides utilization assumptions.
  • A provider comparison that ignores engineer time.
  • A benchmark that does not match your batch size, model size, or data path.

When To Use The Quiz

  • Use the RunPlacement quiz after collecting at least one real quote or bill line.
  • The quiz is most useful when you can state workload type, GPU need, data movement, priority, budget band, and ops tolerance.

FAQ

What is a useful GPU-hour?

A useful GPU-hour is paid accelerator time that actually advances the workload. It excludes idle time, queue time, failed jobs, retries, and time blocked by data staging. Comparing useful GPU-hours is often better than comparing listed hourly rates when providers differ in reliability, storage, or operations.

Should I choose the cheapest H100 cloud?

Do not choose the cheapest H100 cloud by listed rate alone. A low rate can be a poor fit if capacity is unreliable, storage or transfer is expensive, support is thin, or failed jobs create extra billable time. Compare total job cost and workload completion risk.

What should I ask before accepting a GPU cloud quote?

Before accepting a GPU cloud quote, ask what is included in the GPU rate, how storage and transfer are billed, what happens during capacity shortages, and whether failed jobs create billable time. Also ask about support, minimums, commitments, billing controls, and data exit.

Sources

RunPlacement quiz

Pressure-test this workload

Compare useful GPU-hours, data movement, storage, support, capacity risk, and ops burden before choosing the lowest hourly rate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.
Use the quiz