AWS bill shock
AWS Bill Shock Evidence Checklist
Short answer: AWS bill shock should be investigated as an evidence trail before it becomes a migration or provider decision.
- This is a decision checklist, not a final price quote.
- Verify final numbers against provider pricing pages and your own bill or quote.
Use this when
- An AWS bill jumped and the team needs evidence before proposing a fix.
- NAT Gateway, transfer, CloudWatch Logs, S3, idle compute, or managed-service usage may be the driver.
- People are discussing migration before isolating the recurring line item.
Not for
- Replacing Cost Explorer, CUR, Cost and Usage Reports, or a full FinOps process.
- Final cloud migration approval.
- Declaring AWS too expensive before the evidence trail is complete.
Evidence trail
Trace the bill before changing the workload.
AWS bill shock triage starts with the largest recurring delta and the usage path behind it.
Service, region, account, tag, and last normal baseline.
GB processed, logs ingested, transfer path, storage growth, idle hours.
Deployment, route, retention, scale, backup, or traffic shift.
Optimize, cap, alert, reroute, re-architect, or price migration later.
Worksheet Fields
Use this as the working version before copying the decision into a doc, ticket, or vendor email.
| Field | Capture | Why it matters |
|---|---|---|
| Cost delta | Current month, last normal month, service, region, account, and tag. | Ranks what to investigate first. |
| Usage driver | Processed GB, log ingestion, request count, storage growth, idle hours, transfer path. | Connects spend to a measurable workload behavior. |
| Architecture path | Route tables, NAT gateways, VPC endpoints, cross-AZ paths, public/private subnet paths. | Finds whether data is taking an expensive path. |
| Change event | Deployment, traffic change, log setting, retention change, backup, batch job, dependency pull. | Links the bill movement to something that changed. |
| Owner and next check | Team owner, service owner, next AWS console/report/doc to inspect. | Turns surprise into a concrete triage action. |
Evidence-ready
Copy The AWS Evidence Table
Paste this into a triage doc before proposing architecture changes. Keep every unknown visible until it is checked in AWS source data.
Evidence field What to collect Where to check Why it matters Largest delta Current month versus last normal month by service AWS Cost Explorer service view Ranks the first investigation target Region and account Region, linked account, environment, and owner Cost Explorer filters, account tags, billing reports Finds whether the spike is localized Usage metric Processed GB, log GB ingested, requests, storage GB, transfer GB, idle hours Service usage reports, CloudWatch, VPC, S3, EC2, billing details Connects spend to behavior NAT evidence NAT gateway hours, processed data, private subnet routes, cross-AZ paths VPC NAT Gateway docs, route tables, Cost Explorer Finds routing and data-processing surprises Endpoint evidence S3/DynamoDB gateway endpoints or interface endpoints present or missing VPC endpoint configuration Shows whether private traffic can avoid NAT paths Log evidence Log ingestion, retention, debug logging, high-volume writers CloudWatch Logs billing details and log groups Finds observability-driven cost spikes Transfer evidence Internet egress, cross-region transfer, cross-AZ movement, replication VPC, S3, data transfer line items, architecture notes Separates networking from compute pain Change event Deployment, route change, retention change, backup, traffic growth, dependency pull Deploy history, infra changes, CI/CD, incident notes Connects the bill movement to a cause Owner Service owner, platform owner, finance contact, incident owner Tags, repo ownership, on-call, account map Makes the next check actionable Next action Delete, resize, cap, alert, reroute, add endpoint, change retention, or deeper analysis Triage notes Turns bill surprise into a reviewable decision
AI prompt
Prompt To Build An AWS Bill Shock Evidence Trail
Use this with Cost Explorer notes, billing exports, route-table notes, or service usage details. The output should classify evidence before recommending architecture changes.
You are helping me build an AWS bill shock evidence trail. Do not recommend migration or provider switching until the recurring driver is identified. Do not assume current AWS prices beyond the source data I provide. Here is the evidence I have: [Paste Cost Explorer deltas, service/region/account details, usage metrics, NAT/route/log/storage/transfer notes, and recent changes here] Please: 1. Identify the largest recurring delta and the service, region, account, and owner tied to it. 2. Classify the likely driver as compute, network, storage, observability, managed service, support, marketplace, or one-time event. 3. List the strongest evidence and the missing evidence. 4. Recommend the next source check before deleting, rerouting, resizing, changing retention, or proposing migration. 5. Separate quick fixes from architecture changes and migration questions. 6. Avoid provider-ranking, benchmark, or unsupported pricing claims.
Short Answer
- AWS bill shock is an evidence problem before it is a migration problem.
- Start with the largest recurring delta, then identify the service, region, account, usage driver, change event, and owner.
- Only compare placement options after the driver is clear enough to decide whether it is fixable in place.
Evidence To Collect First
- Cost Explorer view by service, region, account, and day.
- Last normal baseline month and current month-to-date trend.
- Usage metric tied to the line item: processed data, storage growth, log ingestion, request count, transfer, or idle hours.
- Recent deployment, routing, retention, backup, logging, traffic, or scale-setting changes.
- Named owner for the workload or configuration.
Common Evidence Paths
- For NAT Gateway: check hourly gateways, processed data, private subnet routes, cross-AZ paths, package pulls, image downloads, backups, logs, and external API traffic.
- For CloudWatch Logs: check ingestion, retention, high-cardinality logs, debug logging, and services writing standard logs.
- For transfer: check public egress, cross-region movement, cross-AZ paths, managed-service data paths, and backup or replication jobs.
- For storage: check lifecycle policies, snapshots, retrieval patterns, incomplete multipart uploads, and replication.
What Not To Conclude Yet
- Do not conclude AWS is the wrong provider from one unexplained spike.
- Do not delete infrastructure before dependent workloads and rollback risk are known.
- Do not call migration cheaper until the recurring driver, migration work, and replacement operations are priced.
FAQ
What evidence should I collect after AWS bill shock?
Collect the largest month-over-month delta by service, region, account, and owner; then collect the usage metric that explains the line item, such as processed data, log ingestion, storage growth, transfer, or idle hours.
Why start with Cost Explorer?
Cost Explorer helps identify service, account, region, and time-based changes before the team debates architecture or migration.
When does NAT Gateway need deeper evidence?
NAT Gateway needs deeper evidence when private subnet traffic, cross-AZ routing, package pulls, container image downloads, backups, logs, or external API calls may be passing through NAT unexpectedly.
Sources
- https://docs.aws.amazon.com/cost-management/latest/userguide/ce-what-is.html
- https://docs.aws.amazon.com/cost-management/latest/userguide/ce-exploring-data.html
- https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-pricing.html
- https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html
- https://docs.aws.amazon.com/vpc/latest/privatelink/gateway-endpoints.html
- https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/LogsBillingDetails.html
- https://aws.amazon.com/cloudwatch/pricing/
RunPlacement quiz
Pressure-test this workload
Find the largest recurring delta by service, account, region, and usage driver before deciding whether to delete, resize, reroute, re-architect, or move a workload.
Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.