Site reliability engineers (SREs) are the glue between “Dev” and “Ops,” ensuring that software engineering expertise is applied to operations challenges. SREs naturally focus on making systems more reliable, efficient, and scalable. If you’re an SRE yourself, you’re already deeply familiar with these ideas.
However, there’s an area naturally connected to these concerns that is not often top of mind with SREs (and other DevOps professionals): cloud cost. Historically, as you probably know, SREs have not had direct visibility into cloud cost data—much less been able to view that data in a relevant and timely fashion. You may hear from finance after a particularly high cloud bill comes rolling in, but that will be at least a month after the fact, when the window to stop the bleeding has closed.
Today, we want to talk about how site reliability engineer responsibilities can include cloud cost part of their day-to-day efforts and realize many benefits (beyond just controlling cost) as a result.
How Cost Supports The Goals of Site Reliability Engineers
As you know, the three main site reliability engineer responsibilities are to ensure that websites or applications (and thus their underlying infrastructure) are reliable, efficient, and scalable. What many don’t realize is that monitoring cost can actually be a huge help in achieving these goals.
With the right data at their fingertips, SREs can begin to build cost-consciously. This is a natural shift, because a well-architected system is often a cost-efficient system. Also, let’s face it: no website or application will be sustainable or efficient if it continually incurs runaway costs.
You often collaborate closely with product developers, working to design solutions that have high availability, performance, security, and maintainability. You also work with release engineers to ensure that the software delivery pipeline is as efficient as possible. It’s a natural next step to begin working with DevOps and the finance team to monitor and plan around cloud costs.
Expanding Monitoring Programs to Include Cloud Cost
As an SRE, you are likely no stranger to monitoring tools. You probably spend a significant amount of time monitoring metrics related to performance, availability, and efficiency. So, given the right tools, monitoring cloud costs in a real-time and proactive way is a natural next step.
The main activities SREs should aim to fold into their processes to become more cost-conscious include:
Monitoring cost across cloud accounts in real-time
Segmenting cost by team, product, AWS service, or other metric
Identifying architectural dependencies
Optimizing infrastructure for cost
To do this, SREs need to expand the monitoring programs you already have in place to include cloud cost.
You will need two forms of instrumentation to achieve these goals:
Application & Infrastructure: Real-time visibility into applications and infrastructure, including complete transactions.
Cost: Visibility into how much various aspects of applications and infrastructure actually cost the business.
Naturally, it’s necessary to correlate information across multiple sources (from the AWS bill to performance monitoring tools). However, once this instrumentation is in place (using a solution like CloudZero, which we’ll explain in a moment), it becomes possible to immediately detect and investigate unintended costs and trends. Additionally, you will be able to track the real costs of resources and reserved capacity, across accounts, enabling the distribution of costs to the proper cost-centers.
With this information at your fingertips, you can work with the rest of the DevOps organization to optimize cloud costs. This means gaining visibility into how specific decisions will impact the bottom line in real time. Then, you can weigh the benefits and drawbacks of any given decision ahead of time and make the best cost-tuning decisions for the business. SREs who have access to cloud cost data in real-time can, for example, determine exactly how much capacity they need for a given project and avoid waste.
Of course—and this is another area that will come naturally to any SRE—the cost optimization process works best when decision-making is automated as much as possible, since cloud environments are often highly complex and interrelated. The fewer human interventions needed the better.
Building Cost Consciousness into Site Reliability Engineers’ Workflows
SREs should be empowered with the necessary data to make better cloud cost decisions. You of course want to help the business succeed by building efficient and effective systems. Adding cost into your workflows and decision-making processes is not only a natural next step, but also another opportunity to add value to the business in plain dollars and cents.
With cloud cost data at your fingertips, over time, SREs will build ever more efficient systems, which is part of the goal of the entire discipline in the first place. Cloud cost optimization management isn’t just good for the bottom line: It helps SREs like you achieve their goal of building applications and websites that are inherently reliable, efficient, and scalable.
CloudZero is the first platform to put cloud cost data in the hands of SREs and DevOps pros in a format that is useful and timely for their efforts. To learn more about CloudZero’s cloud cost optimization capabilities, get started here.
STAY IN THE LOOP
Join thousands of engineers who already receive the best AWS and cloud cost intelligence content.