AWS bill shock

AWS Bill Shock Triage Checklist

Short answer: Use this before assuming AWS itself is the wrong placement.

Estimate only

This is a decision checklist, not a final price quote.
Verify final numbers against provider pricing pages and your own bill or quote.

First 30 minutes

Use This Before Debating Migration

A bill spike needs triage before architecture conclusions. Find the recurring driver first.

Copy the triage rows

Paste service, region, account, current spend, baseline spend, and owner into a sheet.

Rank the deltas

Start with the largest recurring month-over-month change, not the loudest complaint.

Classify the driver

Separate compute, network, storage, observability, managed service, support, marketplace, and one-time events.

Filled example

Example: Network Spike Triage

Hypothetical first-pass triage, not a pricing claim.

Input	Hypothetical value
Largest delta	NAT Gateway line item increased from a normal baseline.
Likely driver	Routing or traffic path changed after a deployment.
Next move	Check route tables, processed bytes, deployment history, and workload owner before migration planning.

What it flags: The useful first answer is the cost driver and owner, not whether the whole cloud account should move.

By Andrew Cooper, Founder of RunPlacement Updated May 2026 Provider-neutral, estimate-labeled guidance Verify current provider pricing

Use this when

An AWS bill jumped and the team does not know which line item caused it.
People are arguing about migration before the recurring driver is isolated.
Networking, logs, storage, managed databases, or idle resources may explain the surprise.

Not for

A full FinOps program or chargeback model.
Replacing Cost Explorer, CUR, or detailed account-level analysis.
Declaring AWS too expensive before the bill delta is understood.

Bill shock triage

Do not start with migration. Start with the delta.

Most surprise bills become less mysterious when the biggest month-over-month change is isolated.

01 Find the jump

Compare this month to the last normal month by service and region.

02 Group the drivers

Separate compute, networking, storage, observability, and managed services.

03 Explain the cause

Look for traffic change, architecture change, retention change, or idle resource.

04 Choose the fix

Delete, resize, re-architect, commit, or migrate only after the driver is known.

Worksheet Fields

Use this as the working version before copying the decision into a doc, ticket, or vendor email.

Field	Capture	Why it matters
Baseline	Last normal month, current month, service, region, account, owner.	Separates a real trend from a one-off event.
Driver class	Compute, network, storage, observability, managed service, support, marketplace.	Keeps the triage from becoming vague cloud blame.
Change event	Traffic, logging, retention, deployment, data path, backup, scale setting.	Connects the bill to something that actually changed.
Action	Delete, resize, cap, alert, add endpoint, change retention, commit, migrate.	Turns surprise into a concrete next move.

Triage-ready

Copy Into A Triage Sheet

Paste this tab-separated block into a sheet when an AWS bill jumps. The goal is to isolate the recurring driver before debating migration.

Field	What to enter	Hypothetical example	Why it matters
Service	AWS service with the biggest month-over-month delta	NAT Gateway	Starts with the line item, not general cloud blame.
Region	Region where the spend changed	us-east-1	Finds regional routing or workload changes.
Account or project	Account, environment, or owner group	production account	Helps find who can explain the change.
Tag or workload	Tag, service name, app, or workload tied to the cost	batch workers	Connects spend to ownership.
Current spend	Current month spend for the line item	$4,200 placeholder	Shows the size of the issue.
Baseline spend	Last normal month for the same line item	$900 placeholder	Separates a spike from normal run rate.
Delta	Current spend minus baseline spend	$3,300 placeholder	Ranks what to investigate first.
Likely driver	Compute, network, storage, observability, managed service, support, or marketplace	Network path change	Keeps triage specific.
Change event	Deployment, traffic, logging, retention, backup, route, or scale change	Private subnet routing changed	Links bill movement to a cause.
Owner	Person or team responsible for the workload or setting	Platform team	Makes the next check actionable.
Next check	Delete, resize, cap, alert, route change, retention change, or deeper analysis	Check route tables and NAT processed bytes	Turns surprise into a concrete step.

Field	What to enter	Hypothetical example	Why it matters
Service	AWS service with the biggest month-over-month delta	NAT Gateway	Starts with the line item, not general cloud blame.
Region	Region where the spend changed	us-east-1	Finds regional routing or workload changes.
Account or project	Account, environment, or owner group	production account	Helps find who can explain the change.
Tag or workload	Tag, service name, app, or workload tied to the cost	batch workers	Connects spend to ownership.
Current spend	Current month spend for the line item	$4,200 placeholder	Shows the size of the issue.
Baseline spend	Last normal month for the same line item	$900 placeholder	Separates a spike from normal run rate.
Delta	Current spend minus baseline spend	$3,300 placeholder	Ranks what to investigate first.
Likely driver	Compute, network, storage, observability, managed service, support, or marketplace	Network path change	Keeps triage specific.
Change event	Deployment, traffic, logging, retention, backup, route, or scale change	Private subnet routing changed	Links bill movement to a cause.
Owner	Person or team responsible for the workload or setting	Platform team	Makes the next check actionable.
Next check	Delete, resize, cap, alert, route change, retention change, or deeper analysis	Check route tables and NAT processed bytes	Turns surprise into a concrete step.

AI prompt

Prompt To Triage A Cloud Bill Spike

Use this with bill exports, Cost Explorer notes, or manually copied line items. It should classify the driver before recommending migration.

You are helping me triage a cloud bill spike. Do not assume provider pricing beyond the line items I provide. Do not recommend migration until the recurring cost driver is classified.

Here are the bill details:
[Paste service, region, account, current spend, baseline spend, tags, and known changes here]

Please:
1. Identify the largest recurring month-over-month delta.
2. Classify the likely driver as compute, network, storage, observability, managed service, support, marketplace, or one-time event.
3. List the most likely change events that could explain the delta.
4. Recommend the next checks before changing providers.
5. Separate quick configuration fixes from architecture changes and migration questions.
6. Label unknowns and avoid unsupported pricing, benchmark, or provider-ranking claims.

You are helping me triage a cloud bill spike. Do not assume provider pricing beyond the line items I provide. Do not recommend migration until the recurring cost driver is classified.

Here are the bill details:
[Paste service, region, account, current spend, baseline spend, tags, and known changes here]

Please:
1. Identify the largest recurring month-over-month delta.
2. Classify the likely driver as compute, network, storage, observability, managed service, support, marketplace, or one-time event.
3. List the most likely change events that could explain the delta.
4. Recommend the next checks before changing providers.
5. Separate quick configuration fixes from architecture changes and migration questions.
6. Label unknowns and avoid unsupported pricing, benchmark, or provider-ranking claims.

Short Answer

Most AWS bill shock needs a line-item triage before a migration decision.
Start with the biggest delta from the previous normal month.
A high bill is not enough evidence that AWS is the wrong placement; it may be one architecture decision, one logging change, or one idle resource class.

Line Items To Check First

NAT Gateway processing and hourly charges.
Cross-AZ, inter-region, and internet data transfer.
CloudWatch logs, metrics, retention, and custom metrics.
S3 storage class, requests, lifecycle gaps, retrieval, replication, and transfer.
Idle EC2, overprovisioned instances, unattached EBS volumes, and forgotten load balancers.
Managed databases, snapshots, backups, and provisioned throughput.
Support plan changes, marketplace products, and one-off service usage.

Triage Table

If networking jumped: inspect NAT, cross-AZ paths, egress, region movement, and load balancer traffic.
If observability jumped: inspect log volume, retention, custom metrics, and debug logging.
If storage jumped: inspect request volume, lifecycle rules, replication, retrieval, and snapshots.
If compute jumped: inspect idle capacity, autoscaling, instance families, commitments, and GPU usage.
If managed services jumped: inspect provisioned capacity, backups, replicas, and default settings.

Rough Math

Monthly surprise = current monthly line item - previous normal baseline.
Repeatable surprise = line-item delta expected to recur next month.
Fix payback = engineering time cost / monthly savings.
If one repeatable line item explains most of the jump, fix that before changing providers.

Questions To Ask Internally

What changed in traffic, logging, data volume, architecture, or retention?
Did a private subnet path start routing through NAT unexpectedly?
Did a debug flag or log level stay on?
Did data start crossing zones or regions?
Did a test workload become always-on?
Can this be capped, alerted, or deleted today?

Red Flags

Private subnet traffic going through NAT by default.
Debug logs retained like production audit logs.
Data movement priced after architecture, not before.
No owner for old resources.
Dashboards that show total spend but not the delta driver.

When To Use The Quiz

Use the RunPlacement quiz after identifying whether the bill is mostly compute, networking, storage, observability, managed services, or GPU.
The quiz helps decide whether to optimize AWS, move the workload, or choose a simpler category.

FAQ

What should I check first after an AWS bill spike?

After an AWS bill spike, check the largest month-over-month delta by service, region, and account. Then classify the driver as compute, network, storage, observability, managed service, support, or marketplace. The first fix should target the recurring driver, not the total bill in isolation.

Can NAT Gateway cause AWS bill shock?

Yes. NAT Gateway can cause AWS bill shock because hourly usage and processed data can both matter, especially when private subnet traffic takes an unexpected path. Verify current AWS pricing pages before estimating the amount, then inspect routes, endpoints, cross-AZ paths, and high-volume workloads.

Should I migrate away from AWS after one bad bill?

Usually no. Do not migrate away from AWS after one bad bill until the increase is understood. First decide whether the spike is recurring, fixable in place, or caused by one architecture, retention, logging, routing, or idle-resource decision. Migration should be a payback decision.

Sources

RunPlacement quiz

Pressure-test this workload

Find the top bill drivers first, then decide whether to optimize, re-architect, or migrate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

Use the quiz