AWS Cost Optimisation: Cutting Cloud Spend by 35%

The Growing Cloud Bill

Cloud spending has a tendency to spiral when teams provision resources without visibility into cost impact. What starts as a few development instances quickly grows into hundreds of resources across multiple accounts, each with its own spending patterns and waste profiles. Without a FinOps culture that embeds cost awareness into engineering decisions, organisations often discover their cloud bill has doubled before anyone sounds the alarm.

The FinOps approach shifts cost ownership to engineering teams, treating cloud spend as an efficiency metric alongside performance and reliability. This article documents the strategies and tools used to reduce AWS spending by 35% across a multi-account environment while maintaining or improving application performance and availability.

Right-Sizing Strategy

AWS Compute Optimizer analyses CPU, memory, and network utilisation across EC2 instances and provides downsizing recommendations based on historical usage patterns. However, automated recommendations are only a starting point. Custom scripts correlate CloudWatch metrics with application-level performance data to identify instances where utilisation peaks never exceed 30% of allocated capacity. RDS databases receive similar analysis, comparing provisioned IOPS and storage against actual throughput patterns. EKS node groups are evaluated by comparing requested versus actual resource consumption at the pod level, revealing nodes that run at 20% utilisation because workloads over-request CPU and memory in their resource specifications.

Reserved Instances and Savings Plans

Commitment-based discounts offer significant savings but require careful analysis to avoid locking into unused capacity. The methodology begins by identifying steady-state workloads that maintain consistent utilisation over 90 days. These workloads form the baseline for Reserved Instance or Savings Plan commitments. The analysis balances flexibility with savings: Compute Savings Plans offer lower discounts than EC2 Instance Savings Plans but allow instance family and region changes. For workloads with predictable patterns, the deeper discounts justify the reduced flexibility. The commitment strategy is reviewed quarterly, adjusting coverage targets based on actual usage trends and upcoming architectural changes.

Spot Instances and Karpenter

Karpenter on EKS dynamically provisions the most cost-effective instances for each workload based on the pod's resource requirements and scheduling constraints. Unlike the traditional Cluster Autoscaler, Karpenter evaluates instance types, availability zones, and purchase options in real time, selecting spot instances when appropriate and falling back to on-demand only when spot capacity is unavailable. Interruption handling is built into the workload design: stateless services use pod disruption budgets and graceful shutdown periods, while stateful workloads run on on-demand instances with dedicated Karpenter provisioners. This approach achieved 60-70% savings on non-critical workloads including batch processing, CI runners, and development environments.

Cost Governance Checklist

Implemented AWS Cost Explorer dashboards with per-team cost attribution.
Right-sized 40+ EC2 instances and RDS databases saving $15K monthly.
Deployed Karpenter on EKS achieving 65% cost reduction on compute.
Established Reserved Instance coverage targets at 80% for steady-state workloads.
Automated unused resource cleanup with Lambda functions and CloudWatch Events.

Cost optimisation is not a one-time project but an ongoing practice. Monthly cost reviews, automated anomaly detection, and engineering team scorecards ensure that savings are sustained and new waste is caught early. The combination of technical optimisation and cultural change around cost ownership is what makes the 35% reduction durable over time.

tags:

The Growing Cloud Bill

Right-Sizing Strategy

Reserved Instances and Savings Plans

Spot Instances and Karpenter

Cost Governance Checklist

tags:

Search by Keywords

Category

Recent Posts

Platform Engineering: Building Developer Self-Service

Building an Observability Stack for Live Sports Telemetry

Building Zero-Downtime CI/CD Pipelines at Scale

Tags