<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1310905&amp;fmt=gif">

How to Avoid Common Architectural Waste

[Chapter 2] How to Reduce Your AWS Bill

Return to Chapter List

Like any tech debt, some choices may have made sense at the time or been a quick fix to an urgent problem, but aren’t scaling with the business. These things aren’t necessarily quick to fix and have to be planned out, just like any other tech debt reduction activity.

So why does cost tech debt build up?

  • There are always time constraints on engineering and usually competing priorities.
  • Engineers don’t have visibility into the cost of features they’re building, or the tools to give them relevant and timely cost feedback.
  • Last and most strategic, cost isn’t usually a non-functional requirement considered during the design phase, such as performance or security.

Imagine if an engineering team knew their service couldn’t cost more than $1,000 per month based on a certain production load. It would certainly impact the way they designed their system. Engineers thrive on data, but they often don’t have access to cost requirements or cost feedback – which has to change.

Here are six common areas where tech debt accumulates in the average company

#1: Snapshots

DESCRIPTION: Snapshots are often used as a backup storage system in AWS. Nearly all companies spend money on snapshots, which isn’t a problem in and of itself.

 

SYMPTOMS: If the team is spending 5-10 percent of the monthly bill on snapshots or has snapshots older than 90 days, this is an area of concern.

TYPICAL CAUSES: In many cases, snapshot costs build up because there aren’t non-functional requirements around backup and recovery. If conservative defaults are left in place or no explicit lifecycle policies are created, costs can easily accumulate. For applications built before 2018, snapshots had to be managed manually, which also leads to problems in some organizations.

WHAT CAN BE DONE: If possible, use the AWS Data Lifecycle Manager to automate snapshots. Then, change the retention period on existing snapshots, depending on the business’ requirements for the application or service. Typical retention periods may range from 7 to 14 days. If the team has snapshots older than 90 days, consider moving them to a lower-cost storage solution such as AWS Well-Architected framework.

#2: Previous Generation Compute

DESCRIPTION: Previous generation compute is compute running on legacy AWS infrastructure. It’s not always an easy area of cost tech debt to remedy.

SYMPTOMS: If an application or an account is spending more than 10 percent of compute on previous generation hardware, this may be an area to remediate.

TYPICAL CAUSES: There are two main causes of previous generation compute, which speak to why it’s so complicated to fix:

  • Once an application or service is in maintenance mode, moving to new hardware requires a lot of testing.
  • Many times companies have previous-generation compute tied up in reserved instances, and they don’t want to lose the coverage.

WHAT CAN BE DONE: If the business can migrate to the current generation hardware, companies can usually see cost reductions of 5-20 percent, as well as performance improvements. However, you need to proceed with caution. Updating an instance can require more effort than anticipated and have unintended consequences when upgrading to newer versions.

#3: NAT Gateway

DESCRIPTION: Amazon’s NAT Gateway makes it easy to securely connect to the internet from a private subnet in a VPC. Companies pay for usage hours and for gigabytes passing through the gateway. Many people use it because it’s a simple managed service.

SYMPTOMS: If NAT Gateway costs exceed 5 percent of a feature or an account’s spend, this is an area of savings opportunity.

TYPICAL CAUSES: Data transfer within AWS is really complicated, and it’s hard to understand how much data is going to traverse the gateway at scale. This can lead to architectures that have excessive NAT Gateway charges.

WHAT CAN BE DONE: There’s an easy way to reduce NAT Gateway costs in five steps, which can be viewed within this blog post. The process involves first using FlowLogs to analyze data transfer through the gateway and using VPC endpoints, whenever possible, depending on the traffic and architecture.

#4: Amazon Simple Storage Service (S3)

DESCRIPTION: When most people think about S3, they think primarily about storage. There are actually three different cost drivers for S3 buckets: storage, api activity and data transfer.

SYMPTOMS: Typically, if the company is spending more than 10 percent of its AWS budget on S3, there may be areas to optimize costs.

TYPICAL CAUSES: S3 is one of Amazon’s oldest, most widely used services with broad applications. This broad applicability can lead to costly S3 buckets. There are also a number of storage classes that not everyone understands, and the idea of intelligent tiering only came out in 2018.

WHAT CAN BE DONE:

  • If storage drives S3 costs, turn on S3 data analytics for the bucket. This feature comes at a minimal cost, but after a few days, AWS will make recommendations on how to optimize storage tiers.
  • If API activity or data transfer drives S3 costs, it’s a bit harder to diagnose. Analyze the activity hitting the bucket, and work with engineering to explore optimizations.
  • If data transfer drives the cost, explore the cache-control headers on the files within the buckets. Oftentimes engineering must explore what’s causing all the data transfer.
  • Beware lifecycle policies that migrate data to AWS Glacier on rapidly changing transactional data. The costs alone for just transitioning data to Glacier will dwarf the cost savings you gain from Glacier’s less expensive storage tier.

You should consider the use cases of your system closely when you are working with S3. S3 is absolutely the lowest cost option for storing data, but if you have data that is very transactional and subject to high volume access, you should ensure caching is implemented correctly on your frontend or consider using a managed database like DocumentDB or DynamoDB to serve the data instead.

#5: AWS Management Services

DESCRIPTION: AWS Management Services are used to help understand other services or resources. These typically include CloudTrail, CloudWatch, Amazon Macie, and Config.

SYMPTOMS: If your accounts spend more than 10-20 percent on AWS Management Services, there may be optimization opportunities.

TYPICAL CAUSES AND WHAT CAN BE DONE: Here are three causes, paired with a to-do to address it.

  • For CloudTrail specifically, spending more than a few hundred dollars per month can usually signal waste. Organizations get one free trail per account, and after that are charged per trail. In most cases you only need one organizational CloudTrail, so explore removing extraneous ones.
  • CloudWatch varies more per organization. Cloud-native applications will spend more on CloudWatch, which drives higher percentages. High CloudWatch charges are usually a result of a few expensive logs and/or something making excessive ‘getMetricRequest’ API calls. Third party services can drive these costs up. The solution involves looking at CloudTrail to identify the problematic actor, and then working with engineering to understand what is happening.
  • Config’s cost scales based on the number of resources in an account and rules configured. It can be higher than people anticipate, especially for more dynamic environments. Config is easy to analyze based on the amount the company is paying per month relative to the value Config provides.

  • CloudTrail
  • CloudWatch
  • Amazon Macie
  • Config

#6: DNS Queries

DESCRIPTION: Amazon’s Route 53 service, among other things, connects user requests to infrastructure running on AWS. Engineering teams don’t always realize they’re paying for DNS queries.

SYMPTOMS: There’s no hard and fast symptom here, as costs can range greatly. As a general rule, companies can optimize if they’re spending more than $250 per month on DNS Queries.

TYPICAL CAUSES: The main driver of DNS costs is usually an org with lots of publicly available infrastructure. Occasionally, companies spend hundreds or thousands of dollars per month performing DNS lookups (which equates to billions of requests).

WHAT CAN BE DONE: Work with engineering to understand why so many requests are being sent. This can usually be optimized (not removed), once identified.

CloudZero Brings Cloud Cost Intelligence to Your Team

CloudZero delivers relevant cloud cost data about products and features to the engineers responsible for building them. Using machine learning, CloudZero automates manual cost management work, detects cost anomalies, and boosts AWS tagging coverage. With CloudZero, innovative companies can proactively reduce cloud costs, control their margins, and eliminate billing surprises.Cloud spending starts with engineering. Controlling it starts with CloudZero.

Get a Live Demo
Start a Free Trial
bkg_threeHexes