Here at CloudZero we're building the world's first Serverless Reliability Management platform, enabling our customers to forecast reliability and address weaknesses in their cloud deployments before they become incidents. We're a polyglot organization, and like many of you who are reading this post, we have a need to build features quickly and get them in front of users. Our goal is to be less than five minutes away from shipping to prod at all times, and to do this we need solid tools. Enter Semaphore, a cloud-hosted CI/CD platform that's been a great fit for us. But first a little bit more about the problem - why is Semaphore such a great fit? In this post, we'll describe what we're trying to accomplish with our technology and call out those variables which are factors in deciding on a CI/CD platform.
About this New World
Since we focus on analyzing cloud-native architectures on behalf of our customers, we've been designing our platform to itself be cloud-native. This allows us to interface with our customers' cloud accounts effectively. In addition, we've chosen to go all-in with serverless technology from the very inception of our company. Specifically, we use AWS Lambda and complementary services like DynamoDB, S3, and SNS. But what is serverless exactly, and why does it matter? First, the word serverless is a bit misleading; it doesn't mean "there's no servers" it means "you don't need to administer servers". Rather than pay for servers in the cloud like you would on AWS EC2 or for dyno containers on Heroku, you pay for hosted functions - a small amount for every millisecond your functions spend running. You also pay for things like database usage (e.g. DynamoDB) and storage (e.g. S3), but like functions, this is pay-as-you-go. This is very advantageous for a startup like us - pay only a little bit now, and scale up to meet demand without a lot of drama.
This new world is sometimes scary, sometimes exciting, and constantly in motion. One of the common problems facing developers is testing and deployment; many tools we've used in the past don't quite work in this new world. The patterns still apply, i.e. we still need Continuous Delivery as much as we needed it 10 years ago, it's just that the toolchains aren't there. Or, more accurately weren't there in any meaningful sense until the open-source community stepped in. One such critical piece of the toolchain is Serverless Framework, a toolbox for managing serverless applications using multiple cloud providers. This tool provides the backbone for a great deal of our work in serverless along with some other useful technologies that we'll talk about in a bit. Another tool that has been invaluable for serverless is Docker - though interestingly enough not for the same reasons as many other projects. For us, Docker helps us replicate conditions on AWS that are not easy to manage locally. AWS, and more generally cloud-native platforms, are analogous to a new type of operating system, one which is quite different from the *nix that many of us have used for decades.
We manage several application repositories on github, some open-source, others proprietary. As these projects also utilize a wide array of technologies, we have a need for a CI/CD platform that is flexible enough to handle it all. We have two primary application stacks and a handful of secondaries. We'll briefly explain what's involved with each of these.
Python 3.6 & AWS Lambda (serverless)
As mentioned previously, one of our primary stacks is AWS Lambda; it supports multiple languages and Python 3.6 is our choice.
Like many other Python projects, we use a combination of pip, tox, pytest, and flake8 for installing dependencies, running automated tests, and linting. We also have started evaluating CodeClimate, a useful tool for grading code quality and reporting test coverage. Part of testing AWS serverless applications involves faking/mocking AWS services like DynamoDB, RDS Postgres, S3, and SNS. For that we use a combination of LocalStack and moto. LocalStack runs as a Docker container and provides implementations of AWS services using localhost, while moto implements them as a python library. Semaphore's seamless support for Docker is a huge win for us, making it possible to use LocalStack effectively.
In order to deploy anything, we require secure storage of our AWS keys and other secrets, something Semaphore handles quite well and without requiring any of them to be stored in our repositories. This is especially relevant for our open-source projects; we do not want any secrets to be visible, even in encrypted form. For the deployment proper, we use a combination of LambCI's Python 3.6 Docker container and Serverless Framework. The former is used to ensure that our python deployment is compatible with AWS Lambda, including tricky native dependencies like pycrypto. Serverless Framework manages the actual deployment of our code to AWS using a combination of community plugins, CloudFormation, and S3.
You might have noticed that there is no explicit "build" step here. Unlike many traditional software platforms, there is no recognizable artifact which is versioned and stored somewhere - at least one that we have to manage directly. Serverless Framework manages this for you. So for our deployments, we just need to make sure Serverless Framework gets what it needs; 1) the Framework is installed, 2) the application code is present, 3) AWS profiles and keys are accessible, and 4) runtime dependencies are addressed.
For applications that are end-user facing, we use ReactJS and its ecosystem to build single-page apps, along with AWS S3, CloudFront, and Route53 to host them on the web in true serverless fashion.
As with the Serverless Python stack, there is no explicit "build" step; build and deploy are done as a single operation. There is really no need for a separate step to create a versioned artifact. Semaphore's design accommodates this model, and it's yet another reason why we like it.
Just to illustrate some of the other interesting tech we need to manage with CI/CD:
- We're building a data pipeline around AWS Kinesis and its JVM-based Producer Library, so we can use Kotlin with tools like gradle. Semaphore handles this easily, even when this technology coexists with other stacks in the same repository.
- Some of our projects are traditional Python, which use setup.py and are deployed to PyPI instead of production. We also sometimes need to support Python 2.7 and 3.6 concurrently; tox and Semaphore's ability to run tasks in parellel make this easy.
- We use Auth0 as our OpenID Connect provider. Using bash, curl, and jq, we can easily automate the administration of our Auth0 Tenants using Semaphore. Essentially this requires nothing more than running scripts - all the tools we need are already at our fingertips.
One of the key selling points of Semaphore is that it truly is a Continuous Delivery platform, not just a Continuous Integration platform. I could talk about this at length, but I'll keep this short and say that it appropriately keeps the validation of the codebase (CI) separate from the delivery (CD). This can be clearly seen in Semaphore's clear distinction between Branches and Servers, representing the CI responsibility and CD responsibility, respectively. Like our technology stacks, the set of applications we manage is also diverse, and that includes development workflow and requirements for deployment to our AWS accounts. Semaphore helps out greatly here by providing flexibility along these axes - the ability to manage public repositories and contributor forks, support for private repos, the ability to have different logic for deployment of different branches, the abiity to keep deployment details private, and the ability to deploy manually from certain branches. Semaphore's support for different deployment configuration for each branch is also incredibly useful.
Just look at what's been possible for us - each application in the table below requires something unique:
|CloudZero AWS Reactor||Open-source serverless application designed to ingest and interpret the "exhaust" of CloudTrail events emitted by one or more AWS accounts.||Open-source Serverless Python 3.6||Pull requests are opened against develop branch by developers or by contributors with repository forks. All commits trigger automated testing and linting of Python 3.6 code. Test results and code health are public.||Merges to develop branch are automatically deployed to our AWS Staging environment for system testing and final validation. Merges to master are automatically deployed to production.|
|CloudZero CLI||Command-line utility for installing self-hosted Reactors and connecting your AWS Accounts to a Reactor||Open-source, traditional Python 2.7/3.6 library||Pull requests are opened against develop branch by developers or by contributors with repository forks. All commits trigger automated testing and linting of Python code, with 2.7 and 3.6 running in parallel. Test results and code health are public.||Merges to master are automatically packaged as source distributions and are deployed to downloads.cloudzero.com as installable tgz archives.|
|Pyfaaster||Utilities and helpers for AWS Lambda written in Python 3.6 - promotes successful design patterns.||Open-source, traditional Python 3.6 library||Pull requests are opened against develop branch by developers or by contributors with repository forks. All commits trigger automated testing and linting of Python 3.6 code. Test results and code health are public.||Merges to master are automatically packaged, versioned, and deployed to PyPI.|
|CloudZero Core Pipeline||Data transmission from Reactors to large backend datastores, and hooks to make this information available to the Core-Web application.||Proprietary, Serverless Python 3.6||Pull requests are opened against develop branch by developers. All commits trigger automated testing and linting of Python 3.6 code. Test results and code health are private.||This project contains two main parts: 1) large-scale per-environment infrastructure like VPCs and Redshift clusters, which is manually deployed to production from master, and 2) focused per-developer serverless resources like lambdas and dynamo tables which is automatically deployed.|
Summary - TL;DR
Whew, that was a lot of information in the last few sections. It certainly makes the case that serverless applications are a new and complex beast for Continuous Delivery, especially those with requirements to be both private and public and support many languages. Distilling this down to a short list of criteria was a challenge we at CloudZero needed to address in order to make an informed and appropriate CI/CD platform choice. If you're reading this and are running a serverless project, it's possible you'll run into the same.
- Reactor and CLI are both open-source projects and projects we host internally. We needed both CI for the general public and private CD for us. This is one major reason we cancelled our Travis subscription - it's very good at the former, but not so good at the latter.
- We needed solid support for open-source workflows - secure forks/PRs, badges, and anonymous access to running jobs and CI results.
- We're a polyglot organization, and that includes multiple languages within repos. We needed support for lots of languages, or at least the ability for us to fill in the gaps when needed. Semaphore gives us a ton of options here, and has regular platform updates to stay current with different tools like Serverless Framework.
- Needed the ability to quickly troubleshoot issues with builds and deploys and run as much in parallel as possible. We want to be < 5 minutes to prod at all times. Travis is hard to troubleshoot when something is broken; their boxes are opaque. Semaphore offers ssh access and it's a killer feature that's saved us many days of frustration.
- Because serverless development does not follow the same development rules, e.g. no build artifacts, we need a platform that is not overly opinionated about how to carry out its tasks. Semaphore's utilization of simple ubuntu boxes, ssh, and bash was a perfect fit. Jenkins-style flexibility and power without the cruft.
- Security is a critical requirement for managing secrets. Ideally our chosen platform would not leak even the existence of secrets. That was one of our problems with using in-repo yaml files a la Travis - secrets are there in the files. Encrypted, yes (if you're smart). But their existence is no longer a secret.
- Native Docker support is very important to suporting serverless development. Many tools do not support Docker to the degree we needed it, and if they do it's often optimized for producing images, not consuming them.
- We needed the ability to scale - we're a startup, so we'd like to pay a little bit now, but with the possibility to pay more and scale out later.
- Script timings help us debug bottlenecks and practice kaizan - process improvement. Semaphore's regular updates help out here - new tools or tool versions are routinely installed by default, allowing us to remove steps from our tasks.
- We wanted something that is very flexible and powerful, but easy for someone to dive right in. Semaphore's out-of-the-box experience was very pleasant and we were able to validate all of our required criteria in a single afternoon. Some products resulted in immediate frustration, and many others were rejected immediately for lack of features.
Overall, we at CloudZero are really happy with Semaphore - it has tremendous flexibility and our experience has been virtually frictionless when setting up new projects, even projects as diverse as ours. There are however a few things we'd like to politely ask for that will help us and other organizations doing serverless projects:
- A large percentage of our build/deploy time involves downloading docker images. Unlike most users of docker, we don't produce images - we use them for testing/deployment. Anything that makes this faster via aggressive caching would be awesome.
- Though we love writing bash scripts and managing them in semaphore, managing a lot of them is a bit tedious. Shared scripts managed in the UI would be nice, and hopefully we would not lose any timing visibility. Management of scripts in a private git repo would also be nice.
- A global installation of tools we use, especially Serverless Framework, would be awesome. They change rapidly, so this might be risky - but it would save us time on our builds.
As a parting thought, CloudZero is always looking for organizations to try the open-source Reactor or partner with us on developing our Serverless Reliabality Management platform - please reach out to us if this interests you. Thanks for reading!