Table Of Contents
Serverless Reliability Management Open Source: CloudZero Reactor for AWS About CloudZero

Today we announce that CloudZero has raised a $5 Million Series A Round, led by Matrix Partners and Underscore VC. This is a very exciting time for us and is a testament to the hard work put in over the past year by our CEO & Founder Erik Peterson and Co-Founder Matt Manger. As a team of one, two, and now nine people, CloudZero has followed a disciplined, entrepreneurial approach to identify and validate a large, growth, market category experiencing significant, unaddressed pain. We call this category, Serverless Reliability Management. Erik, Matt, and I worked with Tim Barrows of Matrix Partners, Michael Skok of Underscore VC, Advisors from Underscore VC’s Core Community, dozens of customers, and an open source community, to prove the value that CloudZero delivers to this category. All of this hard work culminated in securing the Series A so we can next focus 100% of our energy on our SaaS platform and supporting our open source community.

Serverless Reliability Management

Think of your application as a passenger airplane flying through the atmosphere. You and the passengers want the flight to be smooth, fast, low-cost, uneventful; a good experience. In reality, that’s rarely the case. Undesirable behavior occurs inside and outside the surface of the airplane that results in loss of reliability. Undesirable behavior inside the surface of an airplane, such as mechanical issues, an unclean interior, poor service, and unruly passengers all contribute to poor reliability. Undesirable behavior outside the surface of the airplane, in the atmosphere, such as security, congestion, turbulence, and violent weather also contribute to poor reliability. This undesirable behavior compounds (mechanical + congestion + weather = poor reliability) and is extremely hard to manage. It can be a nightmare for everyone involved (passengers, airline, and employees). Now imagine a technology that does more than just monitor and notify you of a nightmare already in progress, it changes the game for everyone involved. At the first indication of a mechanical issue it predicts and provides insight for repair before it breaks (resiliency). At the first indication of congestion, it predicts and provides insight to reroute people and airplanes to their gate or destination (availability). At the first indication of bad weather, it predicts and provides insight to safely transport the passengers and maintain the integrity of the airplane (secure). If your airplane is resilient, available, and secure, then it is trusted.

Your application and cloud environment are very much like the airplane and the atmosphere. Initially, airplanes were simple and they evolved over time into very complex systems where reliability in flight became a matter of life and death for hundreds of passengers. Similarly, cloud systems were initially simple to manage and now they have evolved to become complex systems, perhaps rivaling that of an airplane, however they have not achieved anywhere near the level of trusted reliability found in the airline industry, and that needs to change. As businesses evolve their infrastructure and deploy serverless technology and hundreds (if not thousands) of microservices and APIs, the complexity of the systems compound and the need to maintain trusted reliability is similar to that of an airplane. Complexity breeds more complexity and it will never end. Serverless Reliability Management is the solution that will predict and proactively find trouble amid the complexity before it affects your system. Unlike any other technology, the CloudZero platform predicts system fragility so that engineers may proactively optimize system reliability before trust is compromised.

Serverless Reliability Management is the continuous optimization of serverless cloud systems for high reliability. A reliable cloud system is trusted: continuously resilient, available, and secure. CloudZero’s Serverless Reliability Management platform is a powerful SaaS technology that enables engineers to find and visualize undesirable behavior in cloud environments and optimize the system for high reliability. Serverless Reliability Management is needed because achieving trusted reliability in the cloud at scale is hard, very hard; and when it goes wrong, it goes very wrong. Just look at the the most serious outage of the year, Amazon’s S3 outage in February that took down a big chunk of the internet. How much business was affected by that incident? About $150 Million as reported on NPR. What if that incident could have been avoided or reduced in scope? Emergent behavior, the unpredictable interaction between many discrete parts of a distributed system, occurs frequently in serverless environments and is a major contributor to loss of trusted reliability. As systems grow, the complexity between individual parts also grows and the potential for chaos resulting from emergent behavior grows exponentially. Some examples of emergent behavior are the ever-occurring changes to serverless infrastructures (i.e. AWS; cloud is not steady state), the addition or modification of individual parts of the system, and cloud weather. As is often learned after the fact, it’s very easy to add emergent behavior and very hard and expensive to remove it. The ability to discover and visualize emergent and undesirable cloud behavior is core to serverless reliability management.

Serverless Reliability Management also augments the engineering skills required to manage a distributed system in the cloud, and amplifies the abilities of businesses to compete at scale and deliver trusted reliability. As development, operations, and security disciplines evolve, integrate, and converge for continuous delivery of trusted information, businesses are challenged to organize their engineering teams around the right skill sets. Team composition exists as a spectrum of skills determined by the organization. On one side, teams may be composed of engineers from each discipline (dev, ops, security). On the other side, teams may be composed of engineers expected to be competent in all disciplines. In the middle, you will find teams that represent various security and DevOps configurations, as well as site reliability engineers. Regardless of skill set composition, serverless reliability management supports and amplifies your current team’s abilities enabling them to manage cloud systems at scale. Your team will be smarter and your business more capable to compete and win with trusted, reliable, serverless systems at scale.

The Cloud Cost Playbook

Open Source: CloudZero Reactor for AWS

As we build our SaaS platform we decided to provide value today and open source the CloudZero Reactor for AWS. The Reactor (SaaS or on-prem) ingests and normalizes metadata from various services in your AWS accounts and provides a simple interface to query the information. For businesses participating in our Invite-Only Beta the Reactor is also the pipe for data to flow into our SaaS platform, which models your system and provides valuable insights and context into past, present and future states of reliability. These insights enable you to correct emergent and undesirable behavior, leading to satisfaction and lower costs.

Open sourcing the Reactor provides benefits for all of us. You get a glimpse into our code, the opportunity to contribute, and the ability to add new connections to services and data sources. Whether you contribute code, interact with our community, or speak to us directly, your voice can be heard. And we learn more about the problems that cost you satisfaction and money. The more we learn the more value we can deliver back.

About CloudZero

Follow us: Twitter | Blog

The Cloud Cost Playbook

The step-by-step guide to cost maturity

The Cloud Cost Playbook cover