AWS bill shock

AWS Bill Shock Triage Checklist

Short answer: Use this before assuming AWS itself is the wrong placement.

Estimate only
  • This is a decision checklist, not a final price quote.
  • Verify final numbers against provider pricing pages and your own bill or quote.

RunPlacement quiz

Pressure-test this workload

Find the top bill drivers first, then decide whether to optimize, re-architect, or migrate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.
Use the quiz

Bill shock triage

Do not start with migration. Start with the delta.

Most surprise bills become less mysterious when the biggest month-over-month change is isolated.

01 Find the jump

Compare this month to the last normal month by service and region.

02 Group the drivers

Separate compute, networking, storage, observability, and managed services.

03 Explain the cause

Look for traffic change, architecture change, retention change, or idle resource.

04 Choose the fix

Delete, resize, re-architect, commit, or migrate only after the driver is known.

Short Answer

  • Most AWS bill shock needs a line-item triage before a migration decision.
  • Start with the biggest delta from the previous normal month.
  • A high bill is not enough evidence that AWS is the wrong placement; it may be one architecture decision, one logging change, or one idle resource class.

Line Items To Check First

  • NAT Gateway processing and hourly charges.
  • Cross-AZ, inter-region, and internet data transfer.
  • CloudWatch logs, metrics, retention, and custom metrics.
  • S3 storage class, requests, lifecycle gaps, retrieval, replication, and transfer.
  • Idle EC2, overprovisioned instances, unattached EBS volumes, and forgotten load balancers.
  • Managed databases, snapshots, backups, and provisioned throughput.
  • Support plan changes, marketplace products, and one-off service usage.

Triage Table

  • If networking jumped: inspect NAT, cross-AZ paths, egress, region movement, and load balancer traffic.
  • If observability jumped: inspect log volume, retention, custom metrics, and debug logging.
  • If storage jumped: inspect request volume, lifecycle rules, replication, retrieval, and snapshots.
  • If compute jumped: inspect idle capacity, autoscaling, instance families, commitments, and GPU usage.
  • If managed services jumped: inspect provisioned capacity, backups, replicas, and default settings.

Rough Math

  • Monthly surprise = current monthly line item - previous normal baseline.
  • Repeatable surprise = line-item delta expected to recur next month.
  • Fix payback = engineering time cost / monthly savings.
  • If one repeatable line item explains most of the jump, fix that before changing providers.

Questions To Ask Internally

  • What changed in traffic, logging, data volume, architecture, or retention?
  • Did a private subnet path start routing through NAT unexpectedly?
  • Did a debug flag or log level stay on?
  • Did data start crossing zones or regions?
  • Did a test workload become always-on?
  • Can this be capped, alerted, or deleted today?

Red Flags

  • Private subnet traffic going through NAT by default.
  • Debug logs retained like production audit logs.
  • Data movement priced after architecture, not before.
  • No owner for old resources.
  • Dashboards that show total spend but not the delta driver.

When To Use The Quiz

  • Use the RunPlacement quiz after identifying whether the bill is mostly compute, networking, storage, observability, managed services, or GPU.
  • The quiz helps decide whether to optimize AWS, move the workload, or choose a simpler category.

Sources

RunPlacement quiz

Pressure-test this workload

Find the top bill drivers first, then decide whether to optimize, re-architect, or migrate.

Uses workload type, budget, GPU need, data movement, priority, and ops tolerance.
Use the quiz