<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1310905&amp;fmt=gif">

Byte Down Too: Build Cost Effective Infrastructure Like Netflix

|July 14, 2020|

Think of orgs with lots of data and it’s impossible to not think of Netflix. In a new Netflix Technology Blog, titled "Byte Down: Making Netflix’s Data Infrastructure Cost-Effective", their Platform Data Science & Engineering team describe their data infrastructure "which is composed of dozens of data platforms, hundreds of data producers and consumers, and petabytes of data.” At this scale, cost-effectiveness is a critical matter of success and failure. So, how do they balance the need to remain cost conscious with the organization’s famous emphasis on engineering “freedom and responsibility?"  Culturally, "setting budgets and other heavy guardrails to limit spending" wasn’t a fit. Their solution is to “provide cost transparency and place the efficiency context as close to the decision-makers as possible” and the blog post explains how it’s done in great detail. So, why doesn’t every company do this? Well, if you’re like most high growth technology companies, you don’t have a 2-pizza team dedicated to building and operating a cost-effectiveness platform for engineering like Netflix does. However, you do have CloudZero. CloudZero is a cost tool that's is helping high growth, digital companies to achieve the same results as Netflix, without the expertise and overhead – so we thought we’d walk though how. 

For the team at Netflix, their work has enabled them to rely on cost efficient architecture to build a strong business with healthy margins. They write that “consolidating usage and cost context to create feedback loops via dashboards provides great leverage in tackling efficiency,” and the payoff has been “over a 10% decrease in our data warehouse storage footprint.” When you operate at the scale of Netflix, that’s a pretty significant impact! But you don’t need to have a dedicated team building custom dashboards and analytics to experience these kinds of results. Working with CloudZero, Drift saved over $1.8M in annual cloud cost. But cost reduction is only an initial benefit. CloudZero customers gain strategic advantage by driving innovation within the context of cost - with a tool rather than a team. 

The Netflix team described three key components of their cost visibility solution:

  • Put AWS billing data in meaningful context
  • Relay the cost context to teams via a Custom Dashboard
  • Increase data cost awareness by pushing relevant information where it’s needed

If you replace words like “billing” and “cost” with “systems” or “bugs”—these three bullets start to sound a lot like any monitoring or observability strategy. Like the team at Netflix, we at CloudZero believe that cost deserves to be a first-class metric, treated as any other non-functional engineering requirement or performance metric. We couldn’t agree with these three points more.   

 

Billing Data in Business Context

Netflix’s cost information comes from the AWS Cost & Usage Report, just like for the rest of us. And whether you consume it via CSV or in an S3 bucket, it’s not fun. 

This data is organized by service (EC2, S3, etc.) - which makes sense for an AWS bill, but not much for your business, especially when it comes time to translate this information to your CFO. Tags can be used to indicate relationships among billing items, but the Netflix team found that "this granularity is not sufficient to provide visibility into infrastructure costs by data resource and/or team.” They developed their own approaches to allocating costs for both EC2-based platforms and S3-based platforms. In their case, specific types of bottleneck metrics were most important. 

With companies we work with, the most important step is understanding how various collections of billable resources map to what matters to their business - products and features, or teams and departments for example. This helps them transition their understanding of cost from “How much do we spend on EC2?” to “How much does it cost to build and run different aspects of my products?”.  CloudZero uses a ML-driven process to help build these context mappings even when you don’t have great tag coverage in your accounts – and can even apply tagging metadata to untaggable resources.  Here’s a real life example: CloudZero works with FruitLab, a social media and streaming platform for gamers. Like Netflix, they are focused on driving down the cost of delivering videos to their users to maximize their revenue. Unlike Netflix, they don’t have a team of engineers they can pull off their roadmap.  Nonetheless, CloudZero helped them to reduce their cost of delivering each stream by 18%. 

Once the Netflix team has their cost in the right context, they relay it to the right teams. They built a druid-backed custom dashboard for this purpose. “The primary target audiences for our cost data are the engineering and data science teams as they have the best context to take action based on such information.” We find the same. DevOps in the cloud lets engineering teams move fast on their own. Every single day, they select and use a huge range of services and technologies – all on demand, and through de-centralized processes that are central to their Agile way of working. It’s great for the pace of innovation, but leads to a level of complexity and high rate of change that a centralized finance team can’t possibly communicate out to diverse groups of teams. The context needs to be delivered in near-real-time to the ones making the technology decisions; not to a separate finance team or designated cost-person on the engineering team. 

 

Relevant Views and Context for All Stakeholders

Netflix also understands that cost context differs depending on the team or role. They provide one cost context for teams and another at a higher level for engineering leaders. “Depending on the use case, the cost can be grouped based on the data resource hierarchy or org hierarchy.” They show an example of their leaders’ dashboard which shows cost information alongside key business metrics (view hours change, avg product headcount change, gross revenue change). This is a fantastic example of how you really can’t separate cost from other KPIs in a SaaS business. SaaS engineering leaders own COGS, therefore have the most significant impact on gross margins in their organizations. A great SaaS engineering leader needs to see cost in the context of what matters to the business - cost per tennant, cost per team, cost per unit, etc.

(Example of Leader’s Dashboard - cost alongside business KPIs)

Finally, the Netflix team acknowledged that "checking data costs should not be part of any engineering team’s daily job.” We know our customers’ top priorities are always going to be getting new features out the door fast. Cost management shouldn’t come at the expense of innovation and velocity, so it’s important to get just the right information at just the right time to engineers to increase cost awareness. The Netflix team built email push notifications “to increase data cost awareness among teams with significant data usage.”  These days, most engineers live in Slack, so CloudZero sends automated notifications to separate, configurable channels for any product or combination of features. We can also send emails, if your engineers use it.

One piece that the Netflix team didn’t directly address regarding communication and alerting users was handling unexpected costs. CloudZero builds time-series models at various granularities to make predictions for the future which are compared with existing predictions to identify anomalies. Teams can be alerted to these right away, rather than waiting for someone to discover them on the AWS bill at the end of the month. This may be an element of the Netflix solution, but CloudZero customers are getting immediate benefit from it in the solution today.  

 

The Bottom Line: Let Developers Develop (without Ignoring Cost)

As anyone who uses or sells developer tools will tell you, if it gets in the way of a developer writing software, it won’t get used. The brilliant thing that Netflix has achieved is a careful balance between cost consciousness and engineering velocity. They make relevant, timely data available to engineers and engineering leaders, where they already work with the context they need.

The Netflix blog is a great read - but you don’t need to have a large, dedicated team to experience the benefits they describe. With the right tool, you can put context about your cloud cost in the hands of the decision makers who are driving your innovation. Just give us a shout. 

Learn More About CloudZero

CloudZero is the first real-time cloud cost platform designed specifically for engineering and DevOps teams.

Get a Live Demo
See The Platform
bkg_threeHexes

Subscribe to blog updates