AWS bill shock

AWS Bill Shock Evidence Checklist

Short answer: AWS bill shock should be investigated as an evidence trail before it becomes a migration or provider decision.

Estimate only

This is a decision checklist, not a final price quote.
Verify final numbers against provider pricing pages and your own bill or quote.

By Andrew Cooper, Founder of RunPlacement Updated May 2026 Provider-neutral, estimate-labeled guidance Verify current provider pricing

Use this when

An AWS bill jumped and the team needs evidence before proposing a fix.
NAT Gateway, transfer, CloudWatch Logs, S3, idle compute, or managed-service usage may be the driver.
People are discussing migration before isolating the recurring line item.

Not for

Replacing Cost Explorer, CUR, Cost and Usage Reports, or a full FinOps process.
Final cloud migration approval.
Declaring AWS too expensive before the evidence trail is complete.

Evidence trail

Trace the bill before changing the workload.

AWS bill shock triage starts with the largest recurring delta and the usage path behind it.

01 Find delta

Service, region, account, tag, and last normal baseline.

02 Find usage

GB processed, logs ingested, transfer path, storage growth, idle hours.

03 Find change

Deployment, route, retention, scale, backup, or traffic shift.

04 Pick action

Optimize, cap, alert, reroute, re-architect, or price migration later.

Worksheet Fields

Use this as the working version before copying the decision into a doc, ticket, or vendor email.

Field	Capture	Why it matters
Cost delta	Current month, last normal month, service, region, account, and tag.	Ranks what to investigate first.
Usage driver	Processed GB, log ingestion, request count, storage growth, idle hours, transfer path.	Connects spend to a measurable workload behavior.
Architecture path	Route tables, NAT gateways, VPC endpoints, cross-AZ paths, public/private subnet paths.	Finds whether data is taking an expensive path.
Change event	Deployment, traffic change, log setting, retention change, backup, batch job, dependency pull.	Links the bill movement to something that changed.
Owner and next check	Team owner, service owner, next AWS console/report/doc to inspect.	Turns surprise into a concrete triage action.

Evidence-ready

Copy The AWS Evidence Table

Paste this into a triage doc before proposing architecture changes. Keep every unknown visible until it is checked in AWS source data.

Evidence field	What to collect	Where to check	Why it matters
Largest delta	Current month versus last normal month by service	AWS Cost Explorer service view	Ranks the first investigation target
Region and account	Region, linked account, environment, and owner	Cost Explorer filters, account tags, billing reports	Finds whether the spike is localized
Usage metric	Processed GB, log GB ingested, requests, storage GB, transfer GB, idle hours	Service usage reports, CloudWatch, VPC, S3, EC2, billing details	Connects spend to behavior
NAT evidence	NAT gateway hours, processed data, private subnet routes, cross-AZ paths	VPC NAT Gateway docs, route tables, Cost Explorer	Finds routing and data-processing surprises
Endpoint evidence	S3/DynamoDB gateway endpoints or interface endpoints present or missing	VPC endpoint configuration	Shows whether private traffic can avoid NAT paths
Log evidence	Log ingestion, retention, debug logging, high-volume writers	CloudWatch Logs billing details and log groups	Finds observability-driven cost spikes
Transfer evidence	Internet egress, cross-region transfer, cross-AZ movement, replication	VPC, S3, data transfer line items, architecture notes	Separates networking from compute pain
Change event	Deployment, route change, retention change, backup, traffic growth, dependency pull	Deploy history, infra changes, CI/CD, incident notes	Connects the bill movement to a cause
Owner	Service owner, platform owner, finance contact, incident owner	Tags, repo ownership, on-call, account map	Makes the next check actionable
Next action	Delete, resize, cap, alert, reroute, add endpoint, change retention, or deeper analysis	Triage notes	Turns bill surprise into a reviewable decision

Evidence field	What to collect	Where to check	Why it matters
Largest delta	Current month versus last normal month by service	AWS Cost Explorer service view	Ranks the first investigation target
Region and account	Region, linked account, environment, and owner	Cost Explorer filters, account tags, billing reports	Finds whether the spike is localized
Usage metric	Processed GB, log GB ingested, requests, storage GB, transfer GB, idle hours	Service usage reports, CloudWatch, VPC, S3, EC2, billing details	Connects spend to behavior
NAT evidence	NAT gateway hours, processed data, private subnet routes, cross-AZ paths	VPC NAT Gateway docs, route tables, Cost Explorer	Finds routing and data-processing surprises
Endpoint evidence	S3/DynamoDB gateway endpoints or interface endpoints present or missing	VPC endpoint configuration	Shows whether private traffic can avoid NAT paths
Log evidence	Log ingestion, retention, debug logging, high-volume writers	CloudWatch Logs billing details and log groups	Finds observability-driven cost spikes
Transfer evidence	Internet egress, cross-region transfer, cross-AZ movement, replication	VPC, S3, data transfer line items, architecture notes	Separates networking from compute pain
Change event	Deployment, route change, retention change, backup, traffic growth, dependency pull	Deploy history, infra changes, CI/CD, incident notes	Connects the bill movement to a cause
Owner	Service owner, platform owner, finance contact, incident owner	Tags, repo ownership, on-call, account map	Makes the next check actionable
Next action	Delete, resize, cap, alert, reroute, add endpoint, change retention, or deeper analysis	Triage notes	Turns bill surprise into a reviewable decision

AI prompt

Prompt To Build An AWS Bill Shock Evidence Trail

Use this with Cost Explorer notes, billing exports, route-table notes, or service usage details. The output should classify evidence before recommending architecture changes.

You are helping me build an AWS bill shock evidence trail. Do not recommend migration or provider switching until the recurring driver is identified. Do not assume current AWS prices beyond the source data I provide.

Here is the evidence I have:
[Paste Cost Explorer deltas, service/region/account details, usage metrics, NAT/route/log/storage/transfer notes, and recent changes here]

Please:
1. Identify the largest recurring delta and the service, region, account, and owner tied to it.
2. Classify the likely driver as compute, network, storage, observability, managed service, support, marketplace, or one-time event.
3. List the strongest evidence and the missing evidence.
4. Recommend the next source check before deleting, rerouting, resizing, changing retention, or proposing migration.
5. Separate quick fixes from architecture changes and migration questions.
6. Avoid provider-ranking, benchmark, or unsupported pricing claims.

You are helping me build an AWS bill shock evidence trail. Do not recommend migration or provider switching until the recurring driver is identified. Do not assume current AWS prices beyond the source data I provide.

Here is the evidence I have:
[Paste Cost Explorer deltas, service/region/account details, usage metrics, NAT/route/log/storage/transfer notes, and recent changes here]

Please:
1. Identify the largest recurring delta and the service, region, account, and owner tied to it.
2. Classify the likely driver as compute, network, storage, observability, managed service, support, marketplace, or one-time event.
3. List the strongest evidence and the missing evidence.
4. Recommend the next source check before deleting, rerouting, resizing, changing retention, or proposing migration.
5. Separate quick fixes from architecture changes and migration questions.
6. Avoid provider-ranking, benchmark, or unsupported pricing claims.

Short Answer

AWS bill shock is an evidence problem before it is a migration problem.
Start with the largest recurring delta, then identify the service, region, account, usage driver, change event, and owner.
Only compare placement options after the driver is clear enough to decide whether it is fixable in place.

Evidence To Collect First

Cost Explorer view by service, region, account, and day.
Last normal baseline month and current month-to-date trend.
Usage metric tied to the line item: processed data, storage growth, log ingestion, request count, transfer, or idle hours.
Recent deployment, routing, retention, backup, logging, traffic, or scale-setting changes.
Named owner for the workload or configuration.

Common Evidence Paths

For NAT Gateway: check hourly gateways, processed data, private subnet routes, cross-AZ paths, package pulls, image downloads, backups, logs, and external API traffic.
For CloudWatch Logs: check ingestion, retention, high-cardinality logs, debug logging, and services writing standard logs.
For transfer: check public egress, cross-region movement, cross-AZ paths, managed-service data paths, and backup or replication jobs.
For storage: check lifecycle policies, snapshots, retrieval patterns, incomplete multipart uploads, and replication.

What Not To Conclude Yet

Do not conclude AWS is the wrong provider from one unexplained spike.
Do not delete infrastructure before dependent workloads and rollback risk are known.
Do not call migration cheaper until the recurring driver, migration work, and replacement operations are priced.

FAQ

What evidence should I collect after AWS bill shock?

Collect the largest month-over-month delta by service, region, account, and owner; then collect the usage metric that explains the line item, such as processed data, log ingestion, storage growth, transfer, or idle hours.

Why start with Cost Explorer?

Cost Explorer helps identify service, account, region, and time-based changes before the team debates architecture or migration.

When does NAT Gateway need deeper evidence?

NAT Gateway needs deeper evidence when private subnet traffic, cross-AZ routing, package pulls, container image downloads, backups, logs, or external API calls may be passing through NAT unexpectedly.

Sources

RunPlacement quiz

Pressure-test this workload

Find the largest recurring delta by service, account, region, and usage driver before deciding whether to delete, resize, reroute, re-architect, or move a workload.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.

Use the quiz