Like any tech debt, some choices may have made sense at the time or been a quick fix to an urgent problem, but aren’t scaling with the business. These things aren’t necessarily quick to fix and have to be planned out, just like any other tech debt reduction activity.
So why does cost tech debt build up?
Imagine if an engineering team knew their service couldn’t cost more than $1,000 per month based on a certain production load. It would certainly impact the way they designed their system. Engineers thrive on data, but they often don’t have access to cost requirements or cost feedback – which has to change.
Here are six common areas where tech debt accumulates in the average company
SYMPTOMS: If the team is spending 5-10 percent of the monthly bill on snapshots or has snapshots older than 90 days, this is an area of concern.
TYPICAL CAUSES: In many cases, snapshot costs build up because there aren’t non-functional requirements around backup and recovery. If conservative defaults are left in place or no explicit lifecycle policies are created, costs can easily accumulate. For applications built before 2018, snapshots had to be managed manually, which also leads to problems in some organizations.
WHAT CAN BE DONE: If possible, use the AWS Data Lifecycle Manager to automate snapshots. Then, change the retention period on existing snapshots, depending on the business’ requirements for the application or service. Typical retention periods may range from 7 to 14 days. If the team has snapshots older than 90 days, consider moving them to a lower-cost storage solution such as AWS Well-Architected framework.
DESCRIPTION: Previous generation compute is compute running on legacy AWS infrastructure. It’s not always an easy area of cost tech debt to remedy.
SYMPTOMS: If an application or an account is spending more than 10 percent of compute on previous generation hardware, this may be an area to remediate.
TYPICAL CAUSES: There are two main causes of previous generation compute, which speak to why it’s so complicated to fix:
WHAT CAN BE DONE: If the business can migrate to the current generation hardware, companies can usually see cost reductions of 5-20 percent, as well as performance improvements. However, you need to proceed with caution. Updating an instance can require more effort than anticipated and have unintended consequences when upgrading to newer versions.
DESCRIPTION: Amazon’s NAT Gateway makes it easy to securely connect to the internet from a private subnet in a VPC. Companies pay for usage hours and for gigabytes passing through the gateway. Many people use it because it’s a simple managed service.
SYMPTOMS: If NAT Gateway costs exceed 5 percent of a feature or an account’s spend, this is an area of savings opportunity.
TYPICAL CAUSES: Data transfer within AWS is really complicated, and it’s hard to understand how much data is going to traverse the gateway at scale. This can lead to architectures that have excessive NAT Gateway charges.
WHAT CAN BE DONE: There’s an easy way to reduce NAT Gateway costs in five steps, which can be viewed within this blog post. The process involves first using FlowLogs to analyze data transfer through the gateway and using VPC endpoints, whenever possible, depending on the traffic and architecture.
DESCRIPTION: When most people think about S3, they think primarily about storage. There are actually three different cost drivers for S3 buckets: storage, api activity and data transfer.
SYMPTOMS: Typically, if the company is spending more than 10 percent of its AWS budget on S3, there may be areas to optimize costs.
TYPICAL CAUSES: S3 is one of Amazon’s oldest, most widely used services with broad applications. This broad applicability can lead to costly S3 buckets. There are also a number of storage classes that not everyone understands, and the idea of intelligent tiering only came out in 2018.
WHAT CAN BE DONE:
You should consider the use cases of your system closely when you are working with S3. S3 is absolutely the lowest cost option for storing data, but if you have data that is very transactional and subject to high volume access, you should ensure caching is implemented correctly on your frontend or consider using a managed database like DocumentDB or DynamoDB to serve the data instead.
DESCRIPTION: AWS Management Services are used to help understand other services or resources. These typically include CloudTrail, CloudWatch, Amazon Macie, and Config.
SYMPTOMS: If your accounts spend more than 10-20 percent on AWS Management Services, there may be optimization opportunities.
TYPICAL CAUSES AND WHAT CAN BE DONE: Here are three causes, paired with a to-do to address it.
DESCRIPTION: Amazon’s Route 53 service, among other things, connects user requests to infrastructure running on AWS. Engineering teams don’t always realize they’re paying for DNS queries.
SYMPTOMS: There’s no hard and fast symptom here, as costs can range greatly. As a general rule, companies can optimize if they’re spending more than $250 per month on DNS Queries.
TYPICAL CAUSES: The main driver of DNS costs is usually an org with lots of publicly available infrastructure. Occasionally, companies spend hundreds or thousands of dollars per month performing DNS lookups (which equates to billions of requests).
WHAT CAN BE DONE: Work with engineering to understand why so many requests are being sent. This can usually be optimized (not removed), once identified.
CloudZero delivers relevant cloud cost data about products and features to the engineers responsible for building them. Using machine learning, CloudZero automates manual cost management work, detects cost anomalies, and boosts AWS tagging coverage. With CloudZero, innovative companies can proactively reduce cloud costs, control their margins, and eliminate billing surprises.Cloud spending starts with engineering. Controlling it starts with CloudZero.