For developers and programmers, Amazon Web Services (AWS) offers many benefits. It gives you access to the computing and DevOps tools you need at the press of a button — which helps you get products out the door fast.
However, it can be challenging to control your costs and identify waste. In this comprehensive guide, we will examine some practical steps you can take for AWS cost optimization.
In the first chapter, we’ll look at how to ensure that the right configurations are in place and the right data is available when the team is ready to start focusing on costs. Next, we will explore ways for avoiding common architectural waste. Finally, we’ll focus on creating the ideal scenario going forward — how you can control AWS costs proactively before excess costs accumulate.
Table of Contents
The first step to success with AWS — and controlling AWS costs — is to get the right configurations in place. In the sections below, we'll dig into where you should focus for AWS cost optimization, and some important setup steps like tagging.
There are a number of steps you can take on the front end to make it easier to track and manage your AWS costs. The following sections outline the steps and AWS tools you can use for AWS cost optimization.
To start off, if you’re not using AWS Organizations and consolidated billing, consider implementing these tools. AWS Organizations enables teams to automate account creation, create groups of accounts to reflect business needs, and apply governance policies for these groups. The consolidated billing feature within AWS Organizations lets teams consolidate payments for multiple AWS accounts. This will help you stay organized and consistent.
Second, consider creating different AWS accounts for production and development. AWS will recommend going even further by creating separate accounts for each environment, segmented by feature or product. While that practice can simplify understanding of cloud spend, it can be challenging to manage at scale. As a general rule, separate out production workloads from development at a minimum. Based on organizational requirements, teams may decide to segment further.
Next, focus on establishing a company tagging policy. Tagging will enable teams to identify, organize, filter, and search for resources within AWS. Tagging works best when there’s a company policy that sets expectations for engineering teams. There are many ways to make this process less manual by incorporating standard tags into your Terraform or CloudFormation templates.
Once the team has started tagging its infrastructure, activate important tags with cost allocation tagging, so they can be used with AWS Cost Explorer or cost management vendors. For example, enabling “AWS-Generated Cost Allocation Tags” creates a useful tag called “aws:CreatedBy” that shows which identity and access management (IAM) roles are creating resources.
For most organizations, tagging is a critical component for managing their cloud spend. If only it were easy to enforce! The first step to creating a tagging policy is to start setting expectations for engineering teams. The three most common tags to standardize are:
Most development teams leverage some continuous integration/continuous delivery (CI/CD) and/or infrastructure-as-code tools, which are great ways to tag newly created resources. Unfortunately, there’s usually a manual process involved in tagging existing resources (starting with the most expensive ones). Throughout this process, don’t forget to tag supporting resources such as elastic block storage (EBS) volumes or snapshots associated with elastic compute cloud (EC2) instances. For teams looking for help maximizing their existing tags, tools like CloudZero can boost existing tag coverage.
Finally, create an hourly AWS Cost & Usage Report (CUR). Even if your organization is not planning to think about AWS cost optimization for another six months, the majority of the cost vendors need to ingest CUR data. AWS doesn’t back-populate this data — so having it ready is helpful and doesn’t cost much. When creating an hourly CUR, check the “Include resource IDs” checkbox and leave the rest as defaults.
For the level of granularity that many cost management vendors (like CloudZero) need, you may want to turn your setting to:
There are many useful features within AWS Cost Explorer for controlling costs. For example:
Use these to plan and manage reserved capacity, which can save up to 75% on the hourly rate compared to on-demand pricing.
If the company’s environment is relatively static and the team has good account segmentation, this resource can be helpful. However, it can quickly get complicated with different teams building different features.
A relatively new tool offered out of the EC2 area, AWS Compute Optimizer is a great resource to help identify EC2 waste.
Other AWS services help with understanding and optimizing infrastructure. The Config feature helps inventory your resources, and Trusted Advisor provides proactive recommendations to help optimize your AWS environment. Unlike the free resources listed above, costs for these services can add up depending on your environment.
Just about every growing company has incurred technical debt—the result of taking shortcuts in order to achieve more rapid gains — at some point. Inevitably, some development choices may have made sense at the time or were a quick fix to an urgent problem, but aren’t scaling with the business. These things aren’t necessarily quick to fix and have to be planned out, just like any other tech debt reduction activity.
So why does tech debt build up?
Imagine if an engineering team knew their service couldn’t cost more than $1,000 per month based on a certain production load. It would certainly impact the way they designed their system. Engineers thrive on data, but they often don’t have access to cost requirements or cost feedback — which has to change.
Technical debt sometimes occurs in the realm of cloud governance and the organization of resources. Specifically, we often see AWS cost optimization opportunities in a few common areas: Snapshots, Previous Generation Compute, Network Address Translation (NAT) Gateway, Amazon Simple Storage Service (S3), AWS Management Services, and DNS Queries.
Snapshots are often used as a backup storage system in AWS. Nearly all companies spend money on snapshots, which isn’t a problem in and of itself.
However, if the team is spending 5–10% of the monthly bill on snapshots or has snapshots older than 90 days, this is an area of concern.
In many cases, snapshot costs build up because there aren’t non-functional requirements around backup and recovery. If conservative defaults are left in place or no explicit lifecycle policies are created, costs can easily accumulate. For applications built before 2018, snapshots had to be managed manually, which also leads to problems in some organizations.
If possible, use the AWS Data Lifecycle Manager to automate snapshots. Then, change the retention period on existing snapshots, depending on the business’ requirements for the application or service. Typical retention periods may range from 7 to 14 days. If the team has snapshots older than 90 days, consider moving them to a lower-cost storage solution such as AWS Well-Architected framework.
Compute running on legacy AWS infrastructure is referred to as “previous generation compute.” It’s not always an easy area of cost tech debt to remedy.
If an application or an account is spending more than 10% of resources on previous generation hardware, this may be an area to remediate.
There are two main causes of previous generation compute, both of which speak to why it’s so complicated to fix:
If the business can migrate to the current generation hardware, companies can usually see cost reductions of 5–20%, as well as performance improvements. However, you need to proceed with caution. Updating an instance can require more effort than anticipated and have unintended consequences when upgrading to newer versions.
Amazon’s NAT Gateway makes it easy to securely connect to the internet from a private subnet in a VPC. Companies pay for usage hours and for gigabytes passing through the gateway. Many people use it because it’s a simple managed service.
If NAT Gateway costs exceed 5% of a feature or an account’s spend, this is an area of savings opportunity.
Data transfer within AWS is complicated, and it’s hard to understand how much data is going to traverse the gateway at scale. This can lead to architectures that have excessive NAT Gateway charges.
There’s an easy way to reduce NAT Gateway — the five steps are outlined within this blog post. The process involves first using FlowLogs to analyze data transfer through the gateway and using VPC endpoints, whenever possible, depending on the traffic and architecture.
When most people think about S3, they think primarily about storage. There are actually three different cost drivers for S3 buckets: storage, api activity, and data transfer.
Typically, if the company is spending more than 10% of its AWS budget on S3, there may be areas to optimize costs.
S3 is one of Amazon’s oldest, most widely used services with broad applications. This broad applicability can lead to costly S3 buckets. There are also a number of storage classes that not everyone understands, and the idea of intelligent tiering is fairly new.
If storage drives S3 costs, turn on S3 data analytics for the bucket. This feature comes at a minimal cost, but after a few days, AWS will make recommendations on how to optimize storage tiers.
If API activity or data transfer drives S3 costs, it’s a bit harder to diagnose. Analyze the activity hitting the bucket, and work with engineering to explore optimizations.
If data transfer drives the cost, explore the cache-control headers on the files within the buckets. Oftentimes engineering must explore what’s causing all the data transfer.
Beware lifecycle policies that migrate data to AWS Glacier on rapidly changing transactional data. The costs alone for just transitioning data to Glacier will dwarf the cost savings you gain from Glacier’s less expensive storage tier.
You should consider the use cases of your system closely when you are working with S3. S3 is absolutely the lowest cost option for storing data, but if you have data that is very transactional and subject to high volume access, ensure caching is implemented correctly on your frontend, or consider using a managed database like DocumentDB or DynamoDB to serve the data instead.
AWS Management Services are used to help understand other services or resources. These typically include CloudTrail, CloudWatch, Amazon Macie, and Config.
If your accounts spend more than 10–20% on AWS Management Services, there may be optimization opportunities.
The following are three causes of overspend on AWS Management Services, paired with the steps to address it:
Amazon’s Route 53 service, among other things, connects user requests to infrastructure running on AWS. Engineering teams don’t always realize they’re paying for DNS queries.
There’s no hard and fast symptom here, as costs can range greatly. As a general rule, companies can optimize if they’re spending more than $250 per month on DNS Queries.
The main driver of DNS costs is usually an org with lots of publicly available infrastructure. Occasionally, companies spend hundreds or thousands of dollars per month performing DNS lookups (which equates to billions of requests).
Work with engineering to understand why so many requests are being sent. DNS queries can usually be optimized (not removed), once identified.
The ideal is to proactively control AWS costs rather than address tech debt cost later; the former is a better long-term solution. Following are some strategies around how to develop a cost-conscious engineering team that has the right tools and a thorough understanding of how to use AWS optimally.
Engineers make buying decisions daily, whether by the architectures they design or instances they start. The only way to strategically control cost is to empower engineering teams and hold them accountable for the cost, just like they’re accountable for performance and security. The AWS Well Architected Framework has a cost optimization section you can reference.
AWS cost optimization is a strategy that relies on people, processes, and technology working together to manage costs. The people must be aware of the costs associated with their activities and the solutions they devise, and the process must make recognizing and controlling costs a key objective. Technology must support both the people and process by providing the information needed to make timely decisions based on an understanding of the associated costs.
Companies should be proactive about managing their cloud investment, not just their cloud costs. Change is a constant with AWS, and organizations struggle to understand what cost changes are meaningful to focus on. Increasing costs aren’t necessarily a problem, if the revenue for a product or feature is increasing. It all starts with understanding how much your features and products cost.
Aligning costs to features and products has traditionally been very difficult for a number of reasons, including inadequate tagging and the presence of numerous accounts. There’s also the reality of shared infrastructure and containerization, which make apportioning cost to specific features or products non-trivial. These challenges need to be addressed before the company can understand the costs of features or products.
The top three causes driving your AWS costs can be addressed with the following steps:
At a tactical level, it really comes down to having the necessary process and tooling in place to be able to hold teams accountable for the cost of their features and products. Here are some examples of questions to ask various stakeholders:
Creating a cost-conscious engineering culture and understanding the costs of your features and products is the foundation for actively managing your cloud investment. This starts with providing the right context to the right teams. For product and engineering teams, this means showing them the costs of the features they are responsible for, so they can take ownership.
It also means they need the right context to understand the underlying sources of each cost. This can be accomplished in a number of ways, but solutions like CloudZero are purpose-built to help organizations adopt a proactive approach to AWS cost optimization by giving teams data on the cost of their features and products.
In this guide, we’ve examined AWS cost optimization from setup through proactive cost control. Following these guidelines will help you get a handle on what is driving tech debt and how you can avoid it. But these steps can only take you so far.
CloudZero can help you zero in on the specific issues driving your AWS costs. It provides real-time, in-depth analysis of your cost data sorted by function, user, time, and other metrics — making areas of waste easy to pinpoint.
CloudZero provides the context you need for making decisions about how to reduce your AWS bill. By organizing your costs and correlating them to the underlying engineering activity, you can quickly detect anomalies before costs accumulate.
Even better, CloudZero makes costs transparent during the design phase, so your engineers can make informed decisions and build cost-effective computing into your application — eliminating the need to address tech debt further down the road.
CloudZero includes these unique features to give you maximum control over your AWS costs:
CloudZero is the only cloud cost intelligence solution that automatically correlates both billing and resource data from across your AWS account to group costs and surface insights. Instead of wasting time to determine what is driving cost anomalies, you can get to work immediately on addressing them. Sign up for a free demo to see how CloudZero can help your company optimize your AWS costs.