decision

H100 On-Demand vs Reserved Capacity vs Spot: Which Should You Use?

Short answer: Use on-demand for uncertain workloads, reserved capacity for predictable GPU demand, and spot or marketplace GPUs only when interruption, variability, and debugging are acceptable.

RunPlacement quiz

Pressure-test this workload

Match the H100 buying model to workload predictability and interruption tolerance before comparing hourly rates.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

Use the quiz

Short Answer

The buying model should follow the workload shape.

Use on-demand when you are still learning. Use reserved or committed capacity when the workload is predictable. Use spot or marketplace capacity when interruption is tolerable and the team can absorb operational variance.

The Mistake

Teams often compare GPU prices as if every H100 hour is the same. It is not.

An H100 hour for a weekend experiment, a production inference endpoint, a checkpointed batch job, and a multi-node training run have different failure costs.

Buying Model Table

Directional only. Provider terms change.

Buying model	Best for	Avoid when	Main hidden cost
On-demand	experiments, uncertain usage, early production	workload is steady and expensive	paying for flexibility you do not need
Reserved or committed	steady inference, planned training, known baseline	usage may disappear soon	lock-in and unused capacity
Capacity blocks	planned training windows or guaranteed capacity needs	jobs are unpredictable or flexible	scheduling mismatch
Spot or preemptible	checkpointed batch, experiments, tolerant jobs	production SLA or fragile training	interruption and recovery work
Marketplace GPUs	low-cost experiments, flexible jobs	sensitive data or strict reliability	node quality and availability variance

RunPlacement quiz

Pressure-test this workload

Match the H100 buying model to workload predictability and interruption tolerance before comparing hourly rates.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

Use the quiz

Rough Math

Estimate only:

monthly GPU cost = committed baseline + burst capacity + idle time + recovery cost + ops time

The cheapest hourly rate can lose if it creates enough failed jobs, restarts, cold starts, or engineer babysitting.

When On-Demand Makes Sense

On-demand is useful when:

demand is unknown
model choice may change
the project may stop next week
the team needs quick iteration
a higher hourly rate is acceptable while learning

Do not optimize too early. If the workload is not stable, a commitment can be more expensive than a high on-demand rate.

When Reserved Capacity Makes Sense

Reserved or committed capacity becomes rational when:

the baseline is stable
the model and GPU type are unlikely to change
utilization is high
procurement can tolerate commitment
the team has enough demand to avoid idle capacity

The key question is not "is the committed rate lower?" It is "will we actually use the committed capacity?"

When Spot Or Marketplace GPUs Make Sense

Spot, preemptible, and marketplace GPUs can be a good fit for:

batch jobs with checkpointing
fine-tuning experiments
research workloads
offline evaluation
jobs that can retry automatically

They are weaker for latency-sensitive inference and fragile multi-node jobs unless the architecture expects churn.

Decision Rule

Use the most flexible buying model until the workload shape is clear. Move to commitment only when utilization is predictable enough that unused capacity is unlikely.

Use RunPlacement

Run the quiz before buying capacity. The answer changes if the workload is bursty, steady, data-heavy, or operationally fragile.

How To Use This Page

Treat this page as a placement filter, not a provider ranking. The goal is to narrow the next quote or benchmark you should run.

Use it in this order:

Identify whether the workload is experimental, bursty, steady, or production-critical.
Estimate useful compute time rather than provisioned time.
Write down the data movement and storage around the compute.
Decide how much operational variance the team can tolerate.
Compare providers only after the workload shape is clear.

This matters because two teams can look at the same pricing page and need opposite answers. A research team running checkpointed experiments can accept interruptions and provider variance. A production inference team with strict latency and support requirements may rationally pay more for the same visible GPU.

What Would Change The Answer

The recommendation changes quickly when one of these inputs changes:

the model no longer fits on the cheaper GPU
latency or throughput becomes the business constraint
training time affects a launch date or customer commitment
data already lives inside one cloud and is expensive to move
compliance or procurement rules exclude smaller providers
the workload becomes steady enough to justify committed capacity
the team cannot absorb extra monitoring, restarts, or provider debugging

This is why RunPlacement asks about priority, GPU need, data movement, and ops tolerance. The placement decision is usually hiding in those tradeoffs, not in the headline hourly price.

Evidence And Sources

This draft uses public pricing or provider documentation plus real-world confusion signals where available:

https://aws.amazon.com/ec2/capacityblocks/pricing/
https://cloud.google.com/compute/gpus-pricing
https://docs.vast.ai/documentation/instances/pricing
https://www.runpod.io/pricing/

Target queries for this page:

H100 on demand vs reserved capacity, H100 spot vs on demand, GPU capacity blocks vs on demand, when to reserve H100 GPUs

Assumptions

The workload can be described as bursty, steady, planned, or experimental.
The team can estimate monthly GPU-hours and interruption tolerance.

FAQs

Q: When should I reserve H100 capacity? A: When baseline usage is predictable and high enough to avoid paying for idle committed GPUs. Q: Is spot good for inference? A: Usually not for strict production inference unless the system is designed for interruption. Q: What should I compare first? A: Utilization and failure cost before hourly rate.

Final Placement Rule

Match the H100 buying model to workload predictability and interruption tolerance before comparing hourly rates.

Pressure-Test It

Before you buy capacity or migrate the workload, run the RunPlacement quiz with the actual workload shape. A rough answer with the right missing variables is more useful than a precise-looking quote for the wrong comparison.

Sources

RunPlacement quiz

Pressure-test this workload

Match the H100 buying model to workload predictability and interruption tolerance before comparing hourly rates.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

Use the quiz

Pressure-test this workload

Short Answer

The Mistake

Buying Model Table

Pressure-test this workload

Rough Math

When On-Demand Makes Sense

When Reserved Capacity Makes Sense

When Spot Or Marketplace GPUs Make Sense

Decision Rule

Use RunPlacement

How To Use This Page

What Would Change The Answer

Evidence And Sources

Assumptions

FAQs

Final Placement Rule

Pressure-Test It

Sources

Keep comparing the workload, not the sticker price

Pressure-test this workload