AWS bill shock / Problem diagnosis
AWS NAT Gateway Bill Shock: What to Check First
Short answer: NAT Gateway bill shock usually means private subnet traffic is taking an expensive path. Start by finding which workload, route table, availability zone, or transfer pattern created the processed-data spike.
- Fix the traffic path before treating the whole AWS account as the problem.
- Verify current provider pricing directly before buying or migrating.
Quick answer
Short answer
Answer: A NAT Gateway bill spike is usually a routing and data-processing diagnosis before it is a cloud migration decision.
Decision rule: Check processed data, hourly gateways, private subnet routes, cross-AZ paths, and recent traffic changes first.
Common trap: The common trap is treating NAT spend like unavoidable AWS overhead instead of tracing the path.
Best next page: AWS bill shock taxonomyDiagnosis workflow
What to check before changing providers
Use this section to turn a vague bill or quote problem into fields a buyer, engineer, or founder can compare.
What to check first
- Processed GB through NAT Gateway.
- Number of NAT Gateways and hours active.
- Private subnet route tables and default routes.
- Traffic crossing availability zones or regions.
- Deployments that changed data path or retry behavior.
When this is not a migration problem
- The spike is one route, endpoint, or retention mistake.
- Traffic can be moved to a cheaper path inside AWS.
- The workload still depends heavily on AWS data or managed services.
Bad diagnosis vs good diagnosis
| Diagnosis | What it says |
|---|---|
| Bad diagnosis | The AWS bill is high, so AWS must be the wrong provider. |
| Good diagnosis | NAT processing jumped because private subnet traffic started taking an expensive path. |
Example scenario
Hypothetical example scenario
A batch worker in a private subnet starts pulling large artifacts through NAT after a deployment change. The fix may be an endpoint, route change, cache, or architecture adjustment, not migration.
This is a hypothetical example, not a provider benchmark. Check your own bill, logs, and provider terms.Fields to capture
Capture these before comparing providers or making a migration call.
| Field | Capture |
|---|---|
| service_delta | NAT Gateway month-over-month increase |
| processed_gb | GB processed through NAT |
| route_change | Route table or endpoint change |
| traffic_source | Workload, subnet, account, or region causing the movement |
What to ask before changing providers
- Which traffic path changed?
- Can an endpoint, cache, or route change remove the NAT path?
- Is the NAT delta recurring or one-time?
- Who owns the service that created the traffic?
Right fit
- NAT Gateway charges jumped month over month.
- Private workloads are pulling packages, logs, images, or external APIs through NAT.
- The team is considering migration before explaining the line item.
Quick checks
- Compare this month to the last normal month by service and region.
- Identify the route tables and subnets using the NAT Gateway.
- Check for cross-AZ paths, package mirrors, container pulls, backups, and log forwarding.
- Look for VPC endpoints or architecture changes that can remove repeated NAT traffic.
Rough math
- NAT surprise = current NAT Gateway total - previous normal NAT Gateway baseline.
- Repeatable NAT cost = hourly gateway cost + recurring processed-data cost.
- Fix payback = engineering time cost / monthly repeatable savings.
Red flags
- Private subnets route all outbound traffic through NAT by default.
- Large recurring data movement goes through NAT instead of a private endpoint or different path.
- No one owns route tables, endpoints, or data movement review.
What to do next
- Use the AWS bill shock checklist to group the bill driver.
- Document the traffic path before changing architecture.
- Run the quiz if the NAT fix raises a larger placement question.
RunPlacement quiz
Pressure-test this workload
Fix the traffic path before treating the whole AWS account as the problem.
Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.Related resources
Use a worksheet before making the call
These supporting pages turn the decision into fields a buyer, engineer, or founder can actually compare.
A first-pass checklist and visual triage flow for finding the AWS line items that usually make a bill jump.
AWS bill shockAWS Bill Shock Evidence ChecklistResearch checklist / 4 sections / source-linkedA source-backed checklist for collecting AWS Cost Explorer, NAT Gateway, transfer, CloudWatch, storage, and routing evidence before changing architecture.
Workload placementWorkload Placement WorksheetChecklist / 7 sections / source-linkedA practical worksheet and decision map for deciding where a workload should run before provider choice hardens.
Related decisions
Keep narrowing the placement question
Follow the adjacent pages when the first answer exposes a deeper cost driver or operating constraint.
For placement decisions, an AWS pricing calculator is useful but incomplete. You also need workload shape, hidden bill drivers, migration cost, operational tolerance, and whether the problem is AWS itself or one expensive line item.
Cloud migrationCloud Egress and Exit Cost: What to Price Before MovingMigration planningCloud egress is only one part of exit cost. A serious migration estimate also prices data export, recurring transfer, storage retrieval, rewrites, testing, downtime, rollback, and new operations.
AWS bill shockCloud Cost Tools for Startups: What to Use Before Hiring FinOpsCommercial investigationStartups usually need three layers: native billing visibility, lightweight alerting or cleanup, and a decision worksheet for workload placement when the bill changes the infrastructure strategy.
Framework
Use the underlying decision model
These framework pages define the terms and formulas behind this specific decision.
Classify bill shock by driver class first: compute, network, storage, observability, managed services, support, marketplace, or commitment mismatch.
Workload placementWorkload Placement Frameworkworkload placementChoose workload placement by matching the workload's cost driver, data movement, performance needs, operational tolerance, and commitment horizon to the right infrastructure category.
FAQ
Why did my AWS NAT Gateway bill spike?
An AWS NAT Gateway bill usually spikes when more private subnet traffic is routed through NAT than expected. Common causes include container pulls, package downloads, backups, logs, cross-AZ paths, internet egress, or a workload moving more data. Check current AWS pricing pages before estimating the exact charge.
Should I delete the NAT Gateway immediately?
Do not delete the NAT Gateway immediately unless you already know the dependent workloads and failure impact. First identify which subnets, routes, services, and traffic flows use it. A safer fix may be a route change, VPC endpoint, architecture adjustment, alert, or retention change.
Can NAT Gateway charges justify leaving AWS?
NAT Gateway charges can support a migration discussion, but only after the traffic pattern is understood. Many surprises are fixable inside AWS with routing, endpoints, or architecture changes. Migration makes sense only when the recurring savings beat migration work, data movement, and new operations.
Sources
RunPlacement quiz
Pressure-test this workload
Fix the traffic path before treating the whole AWS account as the problem.
Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.