Table Of Contents
Stream Episode One:

As we were planning “Cloud Atlas,” our primary goal was to talk with someone who was “in the room” during the genesis of the cloud. Allan Vermeulen, who worked for Amazon from 1999-2021, and who proposed Amazon Simple Storage Service (S3), became that source.

Allan is a born engineer. He has always enjoyed building things — whether “out of bits and software or boards and lumber,” Allan takes pleasure in creating useful things. And in the early/mid-2000s, very little was more useful to the world of software engineering than S3, a core element of Amazon Web Services (AWS).

We talked to Allan for more than two hours about a range of topics: software development in the pre-cloud era, his path to Amazon, his interactions with Jeff Bezos and Andy Jassy, and for a few stolen moments, the Beatles. Below, find a condensed transcript of our full discussion with Allan.

Listen to Cloud Atlas, Episode One here. Subscribe to the series wherever you get your podcasts:

Stream Episode One:

applepodcasts-badge
spotify-badge
amazonmusic-badge
deezer-badge

Allan Vermeulen: …now what I spend my time doing is building things made out of wood. I’m in my little shop in a cabin down by the water right now.

Dustin Lowman: If only there were a cloud solution to wood engineering challenges.

AV: You know It’s an interesting thing. I love building stuff out of wood because there are no cloud challenges, or any of that nonsense. You take your boards and you assemble them into what you want, and you join them up, and you end up with a piece of furniture. It’s awesome.

DL: So jumping back to another pre-cloud era in your life, could you just say your name and the titles that you held while you were at Amazon?

AV: Yeah, I don’t really care that much about titles, but my name is Alan Vermeulen, and I worked for Amazon from 1999 through 2021. I kind of had two roles. For the first few years I was part of engineering management and helped start the AWS business, including running the engineering team there at the outset. Then from 2005 until I retired I worked as an individual contributing engineer, writing code and building products.

DL: When you were growing up and envisioning your professional life, what interested you? What excited you?

AV: Well, I describe myself in the early days as a simple Canadian farm boy who went to engineering school. I’ve always liked building things. In my Ph.D. work, I worked in a field called Solid Mechanics on finite element models, which are models used to analyze solid structures. It turns out that those models are all math at the end of the day.

What you actually spend your days doing when you’re a graduate student is writing code, and I came to the realization really quickly that writing code is really really hard, much harder than it should be. I was writing code in Fortran at the time, and later in C, and that just took all my time.

And so the story of how I got into the cloud and into software components starts then with, “How do I make this easier?”

DL: You mentioned wood working, you mentioned building things. What is it about building things that appeals to you? Is it the act of creation — of making something useful or valuable that wasn’t there before you applied your creativity to it?

AV: It is exactly that. I’m much more of an engineer than a scientist. I like to actually create things, whether those things are created out of bits and software, or boards and lumber. That’s what I like to do, and I like to make it easier for other people to build things more easily, which explains most of my career.

DL: I want to pick up on the software engineering piece. It seems like, today, software engineering is maybe the most elemental skill to have as a professional, anywhere. Comparing today’s picture to back then, when you were getting into it, what are the differences? What was the role of software engineering in the professional world more broadly?

AV: So, like I said, when I was a graduate student, my goal was to build some code that did a thing. Whatever it did, it’s not really important. And the challenge at the time was, you were given a computer, and the computer could run a language, and that was it. You weren’t given any components or any tools to work with.

So if you wanted any of those components, you had to build them yourself. As a result, people just spent an enormous amount of time writing the same basic building blocks over and over again.

Part of the way through graduate school I started working with a language called C. C was developed by AT&T. And they did a really interesting thing with the language that’ll seem puzzling to people who’ve learned software in the last five or ten years: They released the language with no libraries.

So if you wanted to, say, write software that printed strings, you had to write a string library. In my case, I wanted to do linear algebra. So if I wanted to do linear algebra, I had to write, you know, fundamental abstractions that represented matrices and vectors and things like that. As a graduate student, that took all my time.

That, to me, is the fundamental difference: Nowadays, the job of a software engineer is largely to find pieces and assemble them together into the product that you want, whereas at the time all the pieces were created bespoke for a particular project.

It’s as if you’re building a house, and you can’t buy PVC pipe and sinks and doorknobs, and you have to create all those things as part of building your house. Whereas nowadays, you know, Home Depot exists, and you can go there and pick up all these components, and your job is largely one of assembly.

DL: Interesting. As somebody who comes at all this from a writing background, I’m always trying to put the cloud in terms that I, the layperson, can understand. What you’re talking about is similar to the way writing is. My thesis advisor in college said with writing, you have to both generate the material and shape the material. Find the right words and then get them into the right shape.

It sounds like what you’re saying is the difference today for engineering is that the generating material part is taken care of, and it’s more about the shaping than it is about creating the bricks.

AV: Yeah, I mean, software engineering is pretty different from writing, in that in writing, you’re not really encouraged to grab paragraphs that other people have written and just insert them into your work. That’s considered bad. It’s maybe more akin to building music out of loops — pieces that you’ve sampled from other artists.

That’s how people create software nowadays.

DL: I wanted to hear about the startup you joined right out of school. Was that your first entrance into the professional world in a software engineering context?

AV: Yeah, I guess so. After graduate school I joined a company called Rogue Wave Software. Its founder was a guy named Tom Kaffer, who was an oceanographer at the University of Washington. He and I collaborated on the libraries I talked about for you in linear algebra. He’d written libraries for doing general string processing, and so on, and put those out to a free FTP site, and I was just amazed at the uptake — how many people, mostly in the academic community, were grabbing onto these things and wanting to use them.

Tom decided to leave academia and start Rogue Wave Software, and I joined as soon as I left graduate school. The idea was that we sold pieces of software that other people could use and incorporate into their applications.

And I should say that at the time, you know, this is pre-cloud. The internet kind of existed. But the way we sold our components was, when you bought our libraries, we sent you a floppy disk that had the code on it, so we were not prescient enough to see that we should distribute our software via the internet. We sent floppy disks through the mail.

But it was an idea that really caught on. I mean, when I joined there were about ten people. That was in 1992, and over the next seven years, we grew the company to about a $50M a year run rate, and had something near 1,000 employees. We went public in 1996, and we went public the old-fashioned way with eight consecutive quarters of profitable growth. There was a time when that’s what the markets demanded.

So, anyway, it was obviously a good idea then, and by the time 2000 rolled around, people realized that the library and the set of components available with the language was as important as the language itself. It really changed how people thought about writing software.

DL: It sounds to me like an almost proto-cloud concept. Is that accurate?

AV: Well, I mean, there’s a lot of differences. From a business point of view, for example, one of the amazing things that the cloud does is it takes away your need to invest fixed costs to release a product. Even in the Rogue Wave days, you have to buy our libraries before you make a nickel selling your product. You have to buy servers to run our libraries on, and so on, so we would sell you one particular piece, but the whole puzzle has a lot more pieces to it.

So from that business point of view, you’re still investing fixed costs. One of the amazing things the cloud does is take those fixed costs away.

Another amazing thing the cloud does is it helps alleviate a lot of your infrastructure concerns. So if you bought our software components, great. You could write your code. But who was going to run it for you? The answer was, you were going to run it for yourself, and that turned out to be really hard.

So what we did was not really a proto-cloud. What we did was solve one particular aspect of the set of problems that we eventually solved with the cloud. We’re focusing a bit more on that than on some of the other pieces, because that happens to be the one particular aspect that really influenced my career. If you had this conversation with some of the other people who were involved in AWS, they might talk a lot more about the fixed vs. variable cost thing, or about the business models that we came in and completely revolutionized to the customer’s advantage.

DL: Yeah, I can see that it’s conceptually similar, in that you’re giving somebody something reusable as opposed to making them start from scratch. But it’s not addressing every piece of the challenge.

AV: Yes.

DL: So getting back to your professional career — after Rogue Wave Software, what was the trajectory toward Amazon?

AV: At Rogue Wave, we were a bunch of academics and engineers, and we took the pieces we were building very seriously. Everything was very well engineered. We had test suites for everything, and we sold to very serious customers. Our biggest customers were telcos and banks, who really, really wanted stuff that worked all the time. So we were careful with how we did things, and what we sold worked well.

Amazon in late 1999 was in get-big-fast mode. It was an absolute land grab, and so they hired hordes of fairly young people who just wrote a ton of code that did all kinds of stuff, and it was not carefully engineered. I mean, some parts of it were, but a lot of it was just thrown together as quickly as possible. The goal was speed, not necessarily perfection.

So Amazon was looking around for companies to acquire. Rogue Wave was one that came to their attention. So they sent a group of six people out to chat with the leadership team.

We sat down across the table and spent an afternoon together, and I was very impressed — in particular with Rick Dalzell. He told me a few things that Amazon was going to accomplish in the next few months. And what I observed happening was that those things were actually accomplished.

It was really the only job searching I’ve ever done. I sent Rick an email saying, “Hey, I’d be interested in joining you guys,” and ended up at Amazon.

I was brought in with the goal of helping to change software development at Amazon, build some tools, do whatever was necessary to resolve this problem: We had this large messy code base, and we were unable to deliver products as quickly as we wanted to.

DL: My understanding of the genesis of AWS is that it arose out of challenges that Amazon was facing in-house. That you had growth objectives that were hard to reach because of the software development challenges. I wonder if you could talk about witnessing those challenges up close, and perhaps some examples of trying to deal with them.

AV: Sure. So I will say that personally, it was a huge shock to my system to come to Amazon, because at Rogue Wave, I was young, fresh out of college, academic, and all of my direct reports were at least a decade older than me. Most of the people around me were ten, twenty years older than me.

When I moved over to Amazon, in one day all my direct reports were younger than me, and all these people around were this high energy, incredibly different group of people. So culturally it was quite a shock. And the difference in the development styles between those two teams was also really something.

Ignoring some of the business issues that were happening in 2000 with respect to raising capital and so on, the big problem for Amazon at that time was, how do we develop features quickly enough to grow and move into this vision that we had? And the vision at the time was selling things beyond books, and that meant selling things like apparel which required tons of features that we didn’t have.

We had a vision of being able to support other vendors on our platform — like Target or Toys ‘R’ Us — all kinds of companies, and that was the biggest issue. It just took forever to get any software features done.

When an analysis was done of why it took so long, it was because people were building their own infrastructure over and over again to fit into the big mess of software that we had. Those ideas that eventually led to AWS were certainly germinated during that time. They were sort of built out of the pain of us trying to deliver anything.

DL: It also sounds like being in a culture where it’s a lot of young people, hungry and ambitious for growth, that culturally would add a certain pressure.

AV: Well, maybe. I mean one of the issues is that if you’re cranking away, you see your little part of the puzzle, and your goal is to get your small thing done. Often in order to improve the velocity of the whole enterprise, what you have to do is step back and say no, actually, we need to take a break. We had a project, which started before I got there, called “Get our house in order.” We needed to step back and say, look, we just need to reengineer some of these pieces so that they’re separable, and we can run teams independently.

There’s an idea called Conway’s Law. The idea of Conway’s Law was that if you have two teams working on a compiler, they’re going to build a two-phase compiler. It can be generalized to say that the technical infrastructure you design mirrors the structure of your organization.

When you’re Amazon, you started out as a little tiny organization. So, the software that you end up with is one thing. Then, the organization grew really quickly, but the software sort of stayed one big ball, and what happened was that the state of the designed software drove how the organization behaved.

The software was a big mess where all the different components talk to all the other components. That means that all the people in your organization have to talk to each other in order to change anything, because some guy wants to change a line of code, and it’s not clear what that line of code does. He’s got to go and talk to all of the people who might also be relying on that line of code. The result is that you don’t move forward. Or, people do change the line of code, and then we push the software and the site goes down because we hadn’t counted on somebody thinking that this code did a particular thing.

This is kind of interesting about Amazon, and about how the cloud works as well. Take an organization that’s beset with communication problems, and they’ve designed this artifact that’s beset with tangled dependencies. When you ask someone, “How do you fix this?” The natural reaction people have is, “We need to make communication more efficient.” And what we actually decided was, “No, what we need to do is get rid of communication completely. What we need to do is have completely separable teams and have those teams work through very well-defined interfaces.”

It’s a challenge. You’re used to calling whoever in the mail room and saying, “Hey, is my mail there?” And we’re going to say, “No, you’re not allowed to do that anymore, because that person’s not getting anything done. You have to go through this little system for checking if the mail is there.”

That was the concept: We have to take our software and change it into something well-defined where all the pieces are behind what we call “time-hardened APIs.” And again, people were talking about hardened APIs before I arrived. It’s not like, you know, this was some flash of insight that I had coming from Rogue Wave. But we did at Rogue Wave talk about this stuff a lot. We were early in the service-oriented architecture world with the components we sold.

So I was brought in as someone who knew a bit about that world. How do we take those ideas and actually implement them? I spent the first two, three years at Amazon trying to build tools and infrastructure to help us take our tangled code base and tease it apart into pieces with hardened, separable APIs and individual teams that could work on their piece without having to worry about all the other pieces that they used.

So for the semi-technical audience, the classic problem is that teams share a database, which is just an absolute disaster. Because inevitably, some team wants to change the schema of that column to represent some new idea, because they’re building a feature — gift cards or something — and they want to tag users as having a gift card. But if they change the schema, it’s going to break every other team’s software that’s using that same database.

So to make it very concrete, what we actually did was say, “Okay. We need projects where we can take these databases and break them apart, so every team can have their own database. And yeah, I know It’s going to cost more, and there’s going to be inefficiencies there, and Oracle’s going to charge us more licenses because that’s what they do. But we have to do it.”

DL: There’s a few things I want to dig into there. The first thing is for more of the “middle class” technical audience, talk about selling books versus selling apparel. What are the different features you need to build to sell clothing?

AV: The goal was to evolve our software to add features to make the website more more usable for other products, or just to make the website do different things. If you look at Amazom.com in 1999 vs. today, it’s a radically different thing.

It became a radically different thing because we added a bunch of features that did different things. I mean to be very concrete, if you want to sell apparel, you have a concept like size that didn’t exist for books. With books, you might run into problems where people expect a particular edition or something like that. You have to deal with that. But the problems with the variety of books are absolutely nothing compared to the problems of, you know, buying a sweater that’s a slightly different size or a slightly different color.

To sell those new things, there were all kinds of features we wanted to build. And of course there were features unrelated to selling more things on Amazon. We added more community features, like giving a thumbs up or thumbs down to reviews, or the verified names project. All kinds of things.

I should also point out again, this isn’t AWS, but one of the things a lot of people don’t realize about Amazon is the backend code. The supply chain code and the distribution center code, and so on. That represented more than half of the lines of code that were at Amazon in 1999. So the website is less than half, and if you think about how a distribution center changes when you go from having only books to having, you know, everything, it’s radical. You need completely different software to organize that.

The answer to your question is just, your software has to keep evolving if you want your business to evolve.

DL: Well, and to reach the vision of being the Everything Store, you need to have an incredibly granular list of features for every type of product that you’re selling. It makes sense to me that if you have to reinvent the wheel every time you want to add something like that, the time to market and the growth trajectory will just be too slow to be viable.

AV: And things end up being serialized. That was another huge problem we had. So you’ve got one team trying to add features for apparel and some other team trying to add features for electronics, and they both require making schema changes to the same database. Now both teams can’t move independently. They have to, you know, serialize their changes.

DL: The last thing I want to touch on before we jump into the development of AWS is something I’ve been trying to assess in research and in these interviews. How novel of a concept was AWS exactly? It seemed like it beat IBM, Microsoft, Google to the punch by a long shot. But at the same time, what I’ve heard from you, and what we’ve heard from a couple of other people we’ve talked to, is there were ideas germinating that were cloud-like, or that were AWS-like prior to AWS happening. So, what made it so differentiated from everything that came before it?

AV: Sure. So let me tell you two things that are not technical that are game-changers that help AWS that I think a lot of people didn’t appreciate. So, Sun Microsystems had a grid computing system. IBM, I think, was selling one too, way before we launched AWS. But in order to buy time on their grid computing system, you had to talk to their enterprise salespeople and set up an enterprise sales deal and do all of this crazy stuff, you know, in order to start using the system.

And AWS, right from the beginning, we had this vision. We talked about the kid in his dorm room. So the idea was, we wanted a kid in their dorm room who had some cool idea for a new app to be able to build that app without having deep pockets, without having anything more than a credit card, and just spending the money on what they had.

So that was radical at the time. When you looked at how Oracle worked, or how Oracle still works, how Sun worked, how IBM worked, the idea that some random kid with a credit card could buy some of your “enterprise product” was a completely novel crazy idea.

So that’s one thing we did that nobody else had done. They had their solutions, but they were kind of top-down, and we were very bottom-up. We tried to appeal to the people building the apps. We didn’t sell through the CTO.

That was part of my background, too. At Rogue Wave, I was struck often when we visited customers that IBM would sell their products, and you would see the manuals for these products on software engineers’ shelves. You know, I’m just a simple engineer. I like to walk around and talk to people who work for a living. And I would ask them, “So what do you guys use these products for?” And they’d say, “We don’t use them. The company just did some kind of big deal with the sales guys, and they ended up with copies for everybody.”

At Rogue Wave, our goal was always to go the other way. We wanted the grassroots people to want our tools, and for them to put pressure on their higher-ups. That was something that I really appreciated about how we thought about AWS.

AWS has a significant salesforce now. But when we launched, we had no salespeople. None. The goal was, we weren’t going to have any until they were necessary. So that’s radical.

Another radical thing we did was pricing. So this also drove me crazy. At Amazon we would purchase hardware. And when you added up how much it cost to build the hardware, and then you looked at how much vendors were selling it for, it was mind-boggling. The margins on enterprise, storage, and stuff in 2003 were just criminal. We bought enterprise-grade servers. Those things cost a fortune, you know, multiples of the cost of a cheap desktop computer that was almost the exact same thing.

We resolved that we weren’t going to price things that way. We were going to price things in a way that we would be profitable, or at least profitable down the line, as we anticipated the costs of things changing.

But we were not going to set up to be a high-margin business. We were going to set up to be a low-margin business, because we knew how to do that, and what we learned was that customers like it better when you charge them less.

That was radical. That’s completely different from how Oracle thinks, or how IBM at the time would have thought. Those are two things, our pricing model and our sales model.

I’ll give you a third thing that might be interesting about this is actual engineering. One of the things that we took very seriously from the absolute outset was how to have one hundred percent uptime. Or at least multiple nines of uptime. Because when our site went down, people freaked out, including us, but it also showed up in the Wall Street Journal and the New York Times. It was important to us that the website did not go down, so we thought a lot about high availability.

There are two ways to engineer high availability. One is to buy increasingly expensive stuff. The other is to buy three cheap things, and just assume that two of them are going to be working at any one time.

The way the enterprise vendors built their clouds and enterprise infrastructure at the time was, they bought expensive stuff. They made everything top-of-the-line. In a way, that’s an easier sales thing. You can go into your customer and say, “Well, do you want this cheap thing, or do you want the top-of-the-line thing?”

We engineered based on the idea that we were going to buy cheap commodity components and then use the smart software distributed systems knowledge which we had to make it more reliable than the high-end stuff. Because it turns out that if the high-end stuff costs 10x what the cheap stuff costs, and you buy five of the cheap things, you’re going to get better availability if you build your system right.

There were other like cloudish things out there at the time, but I think our all-encompassing vision, including the business aspects of it, and the engineering direction and goals that we had were somewhat different than what most people were thinking about.

DL: Yeah, I mean not to make this a CloudZero thing, but it sounds like you’re talking about building better as opposed to buying better. Instead of buying the most sophisticated machinery, you’re changing the genetic composition to make it more cost-efficient and more high-performing.

AV: Yeah. I mean the disadvantage of that approach is that it takes longer, because you have to figure out how to make your things work correctly. There’s a saying that I used to use for engineers all the time: “A man with a watch knows what time it is; a man with two watches is never sure.”

And so the challenge is, if you’re going to take this approach of buying a whole bunch of redundant pieces, you have to have some way of combining their results in a way that makes sense. It’s much harder. And so the engineering challenges were much, much greater. But that’s the direction we picked.

DL: I want to transition into the development of AWS itself. What was the germ of the idea? What would you say was the moment — if there was one — or the series of moments that constituted ground zero of AWS conceptualization?

AV: There absolutely was not one particular moment. There were lots of offsites that we’ve had, and lots of brainstorming between different people. I think one of the misconceptions people have about Amazon in general, especially in those days, is that it’s some kind of top-down thing, or even a bottom-up thing, where you write a six-pager and present it to Jeff, and then it gets blessed, and all is well.

Amazon at the time was a very collaborative environment. There were a lot of meetings that involved six-pagers and so on. But there were also just conversations in the hallway around problems we were having, problems our potential customers might be having, ideas that kind of led to AWS.

DL: I remember reading about a particular offsite which sounded like the initial visioning meeting for AWS, where it was decided that rather than trying to do database, or storage, or compute, that the initial product would be all of those things. Were you at that off-site? Do you have any insight into what that conversation was like?

AV: Don’t know which offsite in particular you’re referring to. There’s been a bunch of them, but that’s not the original problem. I mean. We released a number of small services first, and then we finally released S3, and then a while later we released EC2, and quite a while later we released these database products, so we didn’t, in fact, release some big bang thing where we had all the components of the cloud baked in.

I mean, we certainly were thinking about it. We talked all the time about what the kind of fundamental building blocks people needed to create web applications. Anybody who’s studied computer architecture knows the von Neumann architecture. And so you can say, well, two that we need for sure are storage and processing. And then you pretty quickly realize that you probably also want some kind of structured database, and so on.

We had that list, and the goal was to build all of those things. But everything was iterative and incremental. We released things as they were ready to be released to customers and iterated on them.

DL: And how did it come to be that you were in charge of S3, or assembling the team that was responsible for S3?

AV: Yeah, I don’t like saying I was in charge of S3. That’s kind of silly. I was the person who originally proposed the idea, and was sort of the lead engineer at the time. So in 2004, I was an engineering leader in AWS. For a bunch of complicated reasons, my family was living in Oregon, and I was working in Seattle, commuting back and forth. That was getting a little bit old, and I wanted to change my life up a bit, so I decided to take a year where I would just work as an engineer. So I called this an “internal Sabbatical,” and I decided that my project for that year would be to build S3.

So I wrote a proposal for what I thought we should build, and how we should build it. I wrote most of that proposal at the Six Arms, which is a pub in Capitol Hill, in Seattle. I just set up my laptop and drank a bunch of Hammerhead Ales, and cranked out my six-pager on what the S3 design should look like. People liked it, and I was lucky enough that I had been at the company long enough that my recommendations for who should be on the team were taken seriously,

We put a group of folks together, and in 2005 we built S3.

DL: Hemingway said: “Write drunk, edit sober.” Is the same principle true in software engineering?

AV: [Laughs] I don’t find that there is a black and white line between drunk and sober, and it’s hard to tell when you do your best work.

DL: Fair enough.

AV: But actually, what I seriously find working in a place like the Six Arms is just having other humans around helps my brain come up with ideas and get the words typed in more quickly. I kind of love working in environments like that.

DL: So another question for our laypeople audience: How would you summarize S3 in a couple of sentences? What was the objective? And what does it do?

AV: S3 is storage for the internet. You want to be able to store whatever information you want to store. And you want to be able to access it from anywhere. So it’s a giant disc drive in the sky, and you can write things to it. You can reach it from anywhere. It’s secure so that only you can read it, or you can set up and decide who’s going to read it, and you pay for exactly what you use. Unlike disc drives, where you have to buy a chunk at a time, with S you just pay every month for however much you’re using, and that’s it.

That was the vision from the outset, and that really is still the vision to this day. There’s a lot of extra features now, and there’s tiers of storage and all kinds of stuff. But at the end of the day, that’s what it is. It’s just storage for the internet.

DL: In an email, you mentioned some of your “non-goals” for AWS working out. I wonder if you could jump into what you meant by that.

AV: We were thinking all the time like I said about the kid in the dorm room and about solving problems for engineers. One thing we didn’t spend a lot of time thinking about was people taking the applications they were running in their data centers and just shifting them into our data centers. 

We thought more about people building new stuff, and we were going to run it for them, and so on. But it turned out that a huge amount of our business later on was what ended up being called “Lift and shift”: People took all the apps running in their data centers, and they would just take those applications and move them over and run them on EC2. It’s fantastic. It’s actually good for everybody,

We hadn’t really thought about that back in 2004 and 2005. There are a lot of areas where it turned out that the ideas we had were more widely applicable than we thought.

DL: I want to also hear about the relationship between you and your team and the senior leadership. Talk a little bit more about the creative relationship between what you and your team are doing, and how Andy and Jeff contributed over time.

AV: I like to say that one of the reasons S3 worked out really well, and is so great is we had a fantastic product manager. Product managers are very hard to find. It is really difficult to find someone with all the skills to understand a market, understand what people are going to be interested in, have the technical wherewithal to know what’s possible. I think it’s one of the hardest jobs in tech.

Well, Jeff Bezos was our product manager for S3. We met with him for a lot of it twice a week, he challenged every idea we had, he challenged how we were building it, he challenged us to think bigger in terms of scale.

I remember one meeting we had where we were talking about some of the challenges around cycling out servers and bringing new ones in. His vision was, “No. What we ought to be able to do is when this rack of servers is no longer cost-effective, I want to unplug it. Throw it in a truck out back, take it wherever they recycle those things, and put in a rack full of new empty servers. Plug them in, and S3 should just see that and take them over.

The scope of his vision, and the details of what he wanted were right on, and incredibly helpful in shaping the project.

Obviously, we would prepare really hard for those meetings, and we would come in Amazon-style with a narrative document that we would review at the beginning. But it was a very collaborative atmosphere.

DL: It sounds like a creative collaboration. Where you have two different perspectives: the people in the trenches, putting it together, doing the muscle work; and somebody with a more broad perspective, with different goals, who’s integrating those into the mechanical parts of it. It’s a creative partnership with different perspectives intervening on the same product.

AV: Yep, that’s fair. But Jeff wasn’t just the supervisor, looking down at those words, commenting one way or the other. If he has an idea, he’s up at the whiteboard, drawing pictures and explaining his idea and getting feedback from the team. I mean, there was a lot of solid collaboration that happened all the way along.

DL: Staying on the subject of Jeff Bezos, just because he’s somebody a lot of people have heard of and have thoughts about — having worked with him closely, is there anything that people might be surprised by?

AV: First of all, it’s been 20 years. Jeff had been a Time Man of the Year in 1999. So by the time I knew him, he was already a big, important guy, but he was not the wealthiest man in the world by any stretch. You know, he was more of a regular guy.

I’ve read a lot about the terrible culture at Amazon, how people tear into each other and so on. And, you know, how Jeff likes to yell at people in meetings. He’s an emotional guy, and I’ve been in meetings where he’ll go after somebody. He’s gone after me. But those are by far the exception.

The thing that people don’t tend to understand is he’s got an amazing sense of humor. He laughs constantly, he’s very collaborative. He’s very good at drawing ideas out of junior people. He connects to people really well and pulls ideas out. I think it’s pretty well understood that he understands things very quickly, and can come up with ideas in an hour that you haven’t had, even though you’ve thought about it for two weeks.

I’d say what’s least understood is that he’s not this ferocious guy who just goes into meetings and yells at people. He’s actually a very collaborative person. People typically leave meetings energized and think, “Wow! That was great. We’ve got all these new things that we’re going to run with.”

DL: On the subject of your collaborators, what were the similarities and differences between the way that you collaborated with him versus Andy Jassy?

AV: Well, we would present to Jeff twice a week, and I’d be sitting in the same room with Andy in 2003 and 2004 for like hours every day. So it was a much closer collaboration with Andy until I started on S3, and then I was off in a different world.

I’ve been lucky enough in my career to meet a lot of people whose names you would recognize. And I do this thing — which is really unfair — where I classify them on what I call the Lennon-Starr scale. I know that Ringo’s a great drummer and da da da da whatever, but he was in the right place at the right time. And I’d say that both Andy Jassy and Jeff Bezos are as far on the Lennon side of that scale as anyone I’ve ever met.

Andy is not a technical guy, but he is incredibly analytic. We were in a million interview loops together, and I always wondered how non-technical people were able to assess if you were going to do well in a job situation like as a technical person. What you do is, you ask people about the things they’ve built, and then you just keep probing on the things they’ve built, and find out if they say things that don’t make any sense, or if they actually know what they’re talking about, if they’re able to discuss trade-offs and so on. Andy was able to do an interview on any topic on Earth.

I think we worked well together because we come from completely different backgrounds and points of view. He would explain his ideas around how we would grow the business, and then I would explain to him what a hash table is, and it worked out.

DL: Would it be fair to say that on the Lennon-Starr scale that you’re George Harrison?

AV: [Laughs] I’d like to think so, but I think a lot of people would disagree.

DL: Let’s talk about the process of building AWS. From a creative perspective, what’s always interesting to me is how much of the initial vision ends up materializing in the final product. How would you say the initial vision for S3 compared with the final product?

AV: In the case of S3, in the case of any of these products, what really impacts the product are a set of decisions that are made almost day to day, week by week, by the people on the team. And so what you really need is a shared vision that’s shared by those people.

What helped us a lot was that I did not manage the team. We had a manager who was a great guy, but my goal was to spend that year being an engineer. But because I was on the team, at every turn, we would say, “We’re doing things this way, not that way. We’re using this approach for how we’re going to distribute objects and not that approach, because this one is cheaper, and we have these goals of being extremely low cost,” even though our competitors at the time had very high margins.

You have to have people that share the vision. I’ll be honest with you, the big problem with our vision is that it was very hard to scale. When you’re growing fast and hiring a lot of people, it’s not clear how you end up with all the teams sharing the same vision and building parts that will generate light and not just heat. 

DL: For S3, what is the foundational innovation or creative “magic trick” that has to take place in order for S3 to have been built? Is that a fair way to ask that question?

AV: There is no magic trick. We read a lot of the academic literature on how to build things in a distributed way, so that we wouldn’t lose data. We invented a few things here and there, but by and large this was technology that was well-known and understood at the time.

I’d say the key thing was to have this vision that people would care about this service where you can pay for as much as you like. I know I’ve talked to people who started experimenting with S3, and they would get bills of like $0.08 for a month. So yeah, you know, having this vision that you could use just exactly as much as you like, and then, having kind of pockets that were deep enough to say, we’re going to invest the capital so that this this will be profitable for us in the long term — that helped a lot, too. But I don’t think there was anything particularly unique or special.

DL: So, to paraphrase, it sounds like you were leveraging existing technology in creative ways. Is that a fair way to encapsulate that?

{% video_player “embed_player” overrideable=False, type=’hsvideo2′, hide_playlist=True, viral_sharing=False, embed_button=False, autoplay=False, hidden_controls=False, loop=False, muted=False, full_width=False, width=’550′, height=’550′, player_id=’111006952256′, style=” %}

AV: That’s right. I’d also say that one of the key things that we did have was the willingness to take a look at the world as it is and not just accept it. Sometimes people look at the world and say there must be some profound reason that things are the way they are, and we would look at products that were out there, and say, “Let’s reevaluate from scratch if this really makes any sense.”

For example, there was this protocol developed primarily at Microsoft called Soap. It was based on XML. And had all this complexity in it.

A lot of software engineers were like, “Well, we have to use Soap because the people of Microsoft are super smart, and they must have thought about everything. So this is how we’re going to do it.”

Now, we started down that path, and at some point we, the engineers building it, kind of got together in a room and said, “Why are we doing this? At the end of the day, all people want to do is put objects and get objects, and they have to do all this other stuff. Why don’t we just invent our own protocol that looks like HTTP, where they just do put and get, and that’s how they store and retrieve their objects?” So we did that, and people loved it.

We were willing to kind of take the conventional wisdom, and just throw that out and say, “No, we’re not doing that. We’re going to try it in a completely different way.” We were very willing to challenge conventional wisdom on absolutely everything, and were encouraged to do so by the people around us. Andy and Jeff are also big “challenge conventional wisdom” people.

DL: Yeah, I mean, the very existence of Amazon originated as a challenge to conventional wisdom. The idea that you had to sell books in bookstores, or that the internet could be a viable source of commerce. We take it for granted now, but that was certainly not a given in the time. 

So, lingering on S3, what were some challenges you faced in building S3?

AV: We had our moments of pain. We launched in March 2006, and we had this outage later in 2006. We were making copies of objects overly aggressively, saturating our networking equipment. It brought down the Japan retail site, and at the time the Japan retail site was way more important than AWS. We had a lot of explaining to do after that. 

That actually was a positive thing though, in that it led to us realizing that we needed to have a completely separate network and separate a bunch of infrastructure for AWS to run on. We separated it out from the rest of Amazon.com to protect the retail site. But now, of course, it would be to protect AWS.

DL: So that was not viewed as a “Let’s stop this project” moment. It was viewed more as a “Here’s an imperfection we have the chance to fix.”

AV: One of the terrific things working in that environment was that we didn’t have the, “I need to explain to investors how I’m going to get another million dollars because we’re going to run out of money, and we have to do something.” Jeff, from the start, fully believed in this, and Andy did, too. So it was not hard to convince the rest of senior leadership in Amazon that this was worth investing in.

We were able to focus on building the product and making the thing that customers wanted. If we needed to hire, if we needed to buy, to spend more capital to buy more equipment, we just did that.

DL: I’ve seen a couple of interviews with Andy Jassy where he’s asked something to the effect of, “Did you know how big this was going to be?” And he more or less says, “No.” He says, “We had an idea that it would be helpful. We didn’t know it was going to be this big.” So I want to throw that question at you. When you were in the process of putting it together, did you envision something industry- and world-altering? What was your concept of a best-case scenario?

AV: I talked a little bit about Jeff’s consistently pushing us that we weren’t thinking big enough. So I would say, yes, I did not have the scale and the scope in my head that this was going to affect how everybody built software. I believe that Jeff did, and so he helped me, kicking and screaming, toward “Yes, it absolutely makes sense to engineer this in such a way that we can scale it to these enormous numbers.” 

In fact, just before I retired, we found the original S3 design document which had growth projections in it, and they were off by a factor of thousands. We had these detailed meetings, and in those meetings it would be typical for Jeff, or sometimes Andy, or even I would say, “I think we’re thinking about this wrong. I think we’ve got to consider what happens if this number is multiplied by a thousand.”

Another thing we also didn’t consider was just how cheap storage would get. The price of raw storage went way down from 2005 till now. And there’s this thing that happens when things get cheap which is really interesting.

I like to go back to when vacuum cleaners were first introduced. People think, “My God, this is awesome. We’re not going to have to spend time doing housework anymore, because now we’re going to have the whole place vacuumed.”

What actually happened was the amount of time people spent doing housework stayed exactly the same, because houses got bigger and people’s standards of cleanliness went up.

We had the same thing with storage. So storage gets cheaper, and you think, “This is terrific. We’re not going to have to spend as much on storage anymore.” And that’s wrong. You’re going to spend exactly the same on storage, and you’re just going to store more stuff.

So we did not anticipate the price of storage going down the way it did, which dramatically changed our infrastructure. We also did not anticipate bandwidth would increase to the extent that it has. And both of those things led to engineering challenges for S3.

DL: The other thing I want to talk about is, with any big project I take on, I inevitably hit a point where I say to myself, “How is this really gonna work? Am I gonna be able to put this together?” Did you ever have that moment where you were looking at this project and thinking to yourself, “There’s just no way,” or were you always confident in the vision and the execution of it?

AV: I was 100% confident that we could build what we’d set out to build. I had self-doubt every step of the way that this was going to turn into the kind of business it needed to turn into for us to recover our invested capital. It still blows me away just how well AWS has done. But the doubts were all around the business side.

Honestly, we just did a good job engineering it. We were lucky enough to start in 2005 with basically a clean sheet and a team of really good people who wrote good code.

DL: What has surprised you about how the cloud is developed over time?

AV: Remember, at the time of AWS, there was no Pinterest, and Facebook was a little tiny thing if it existed at all. All the things we take for granted today don’t exist.

Sometimes I describe myself as a radio engineer in 1956, where I know everything about how your TV works, and I understand everything about TV transmission and so on, but I’ve never seen an episode of “Leave it to Beaver.”

When it comes to the internet, I can talk to you all day long about how it works, how data gets from one place to another. But I do not for the life of me know why Pinterest exists. 

So, what has surprised me has been the growth of all these businesses that have relied on AWS. And that was our goal. That was the, “Kid in the dorm room could build an app without fixed costs that did what they wanted it to do.” I am amazed, and don’t understand all the things that people have actually built, using the tools that we provided.

DL: Another way to look at that is, it feels like the rock star of the last ten or fifteen years has been the entrepreneur. It’s been the Zuckerbergs, or the Elon Musks, or whomever. A number of them were quite literally kids in their dorm rooms. Mark Zuckerberg was quite literally a kid in his dorm room, building a social networking platform.

This is maybe an audacious statement. But do you think it’s fair to say that Amazon powered that entrepreneur-as-rock-star phenomenon?

AV: I think that is true in a lot of cases. There are a lot of companies that were founded between 2010 and 2020 that were able to get by, even though their founders didn’t really understand how to raise capital effectively. Or, they were able to go through Y Combinator and raise a very small amount of capital, and they could get by with that because we helped change the equation in terms of fixed versus variable costs. 

Yeah, that’s been a real game changer for the industry. I hope it hasn’t led to the cult of personality around the CEO. I like to think more that it has let a thousand flowers bloom, and some of them are worthwhile.

I compare AWS — I have an uncle who is the CEO of a company that makes concrete, and nobody thinks about companies that make concrete. But my God, is that ever a big business. You rely on it in all kinds of ways that you don’t understand. That’s kind of the way AWS is shaping up.

DL: Indeed. Part of the reason why the concept for the show seemed interesting to me is that the cloud is very much like that. Another one of our interview subjects depicted it as a public utility — that it’s something that everybody uses, whether they’re aware of it or not. Making them aware of its impact should be interesting to them. 

AV: And at some level that’s happening right? Like people are thinking about having access to broadband as a right that people should have in the same way that they have the right to have electricity. If you go back to the 1990s, when all these ISPs were coming up and doing their ISP thing, nobody thought about it that way. Nobody thought you had the right to have access to the internet through an ISP. But that’s changing.

And in a way, what we’re doing is just the next level up from there. So you know, we thought about electricity as a utility, and then we’ve started to think about fiber access, or at least broadband access, to your home as a utility. And you can imagine starting to think about storage and compute in the same way.

Erik Peterson: As I was listening to your thought process about how you guys were building, it struck me that at the core, you were thinking about the financial side of the business, your gross margins, with infrastructure/technology choices at the center. The driving force behind creating a high-value product that changed the world wasn’t just thinking about building whatever you could build, but using the constraint of how much money it cost as a way to inform the decisions you were making.

I guess my question is, is that accurate? Or am I seeing something that I want to see?

AV: You’re absolutely right. So for AWS, pricing was a very important part of how we thought. It turned out that was Andy’s specialty. When he did his MBA at Harvard, that’s what he did. His thesis — if that’s what you do in an MBA — was around pricing. I’ve learned a ton about pricing through working with Andy.

But basically, for our big services, our pricing is our cost plus some margin. So I like to talk to teams about, “Okay, given the laws of physics, what is the best we can do in terms of cost?”

For example, in storage, let’s say you want to store a gigabyte of compressed, encrypted data. So there’s no correlation between the different bits. You need to consume a gigabyte of platter space. Or of, you know, a gigabyte of Nanda space. That’s at a minimum.

And, in fact, you have to consume more than that, because you want to provide redundancy. But you don’t need to consume double. It turns out you can provide as much redundancy as you want through some fancy math by sharding your data across multiple drives.

And we would model that. We would model, “Okay, what are the trade-offs between having more shards, which lets you consume less space, knowing we can’t get below the gigabyte.” But you can say, “We have 10% more or 20% more,” or whatever it turns out to be, so you’re going to drive that engineering. But as we do that, these other components of the system become more expensive, because you need more indexing space, and so on.

So we would model the heck out of that stuff and drive it down to: “What is the best we can do from a physics point of view in terms of the cost of the service? What’s that going to look like in terms of what we think the components will cost in a few years?”

And then we would drive pricing above that, and then we’d go back to the teams and say, “Your goal is to engineer your systems to suit these parameters.” For engineers, you have to say, “Okay, here are numbers that you can achieve. Here is the bare minimum, according to physics, of how much storage space you should take up. I want you to engineer your system to get there and tell me how long it’s going to take to build that.

“And if you can build something else faster, how much less efficient will it be?” And that’s typically what we’ll end up doing. We’ll say, “Okay, you can deliver a year earlier by building something that uses twice the storage. Let’s do that and ship that.” And then meanwhile, in the background, you’re going to be working at driving the cost down.

So yeah, we would take costs, and then drive them to individual components, and then drive that to the engineering teams, and have them iterate on that. So you’re absolutely right, the dollars that are being spent are a huge piece of what we’re doing, especially because we want AWS to be a utility, not a high-end enterprise.

We want to build a low-margin business. In the utility business, what you’re selling is a commodity. So, you also have to believe that your competitors are going to be able to produce things exactly the same as what we produce. In fact, when Microsoft created Azure they copied S3’s API. So it’s become kind of a standard.

But that means it’s a commodity, right? So we have to work on the kind of things you’re talking about. We have to work on pushing the engineering teams to build things in a way that’s less expensive, and we have to work on our supply chain, and all that fundamental stuff.

EP: It’s fascinating to me how you’re articulating it. Because engineering in a lot of ways is a ballet of working around constraints, and your foundational constraint was physics, the laws of the natural world. Everyone who is built on top of AWS, their lives have become easier because they don’t have to think about the physics. They can get their AWS bill instead of having to think about theoretically, how many atoms can fit on a platter?

The other amazing aspect of all this is that the scale problem has been largely solved by the cloud. So now, when I sit down to build something as an engineer, I ask myself, “Well, what are the constraints?” If there is a constraint that exists, it’s that I have infinite scale, but I don’t have infinite wallet. 

That was the genesis for CloudZero. I started a project over a decade ago where I had to basically scan half the Internet and I had only $3,000 to do it. My constraint was the budget, and and so I got very obsessed with the units of of my system that I was building, so that it would fit into that budget, and I built it all on top of EC2 using Spot instances in a system that was aware of what it was costing while it was running.

And that struck me as an amazing guide to building a really well-architected system. That was what I wanted to enable the rest of the world to think about.

So this idea of taking where you guys were when you were building AWS and thinking about the constraints of the physical world. Now, we’ve abstracted all that out. AWS has abstracted all that out. How do we take the next level of that, and get that same level of thoughtfulness and planning around the economics of what people are building accessible to the rest of the world, so that people will build even more on top of AWS and get even more out of that endeavor?

AV: Yeah, that makes a ton of sense. I mean, one of the problems with what we’ve done is, if you build a business where you’re buying infrastructure upfront, so you have the fixed costs of buying the infrastructure, it’s relatively hard to blow through the limits of what you bought, because your server just stops working, or your database crashes, or whatever.

With AWS, we make it super easy for you to scale as much as you want — even without realizing it. So you can end up with a bill in the hundreds of thousands of dollars, and go, “Holy sh*t! What have I done?”

And part of that is because physically, you just couldn’t buy that many servers, because the big trucks show up and people would be going, “Why is this big truck full of computers showing up at my door?” We’ve changed the rules to make it super easy to spend a ton of money. I do admit with AWS we have not done an awesome job of putting barriers and tools in place to help prevent people from doing that.

Great power, great responsibility, let’s say. But I’ve seen the numbers about average utilization of EC2 instances. Man, people are wasting a lot of money.

EP: They can always think more efficiently. That was my obsession: How do we provide the right metrics to drive that outcome? Right? That’s the thing that I think is interesting, to hear you talk about how you abstract away physical constraints. And then, think about all the people now who are building on top of AWS. Cost has to be on their mind, and it is maybe the only constraint — how can I help them abstract away that constraint?

AV: Yeah, that’s cool. That makes sense to me.

EP: I’m surprised there wasn’t a part of a conversation about SQS. Because depending on who you talk to, some people think SQS came before S3. If there’s one thing we have to answer in this whole series, it’s: What was the true order?

AV: So you say SQS came first, then S3. But even that’s misleading, right, because we had a suite of services for accessing our catalog and pricing data, and so on. And we called those Amazon Web Services going back to 2002.

In fact, I remember doing a talk where I showed all these cool apps people were building on Amazon Web Services. It was stuff like, people were going to used bookstores and scanning the ISBN on the book, and it would come up and tell you how much that edition was selling for on Amazon at the time. People who were into used books could decide whether to buy that one or not, based on that pricing data. So they would use their scanners because it was before smartphones.

So you know, that stuff could be called AWS. But a lot of people put it down to S3, because S3 was the first what we today call “utility computing service.” SQS kind of tried to be that, but for a bunch of reasons it didn’t really satisfy the unlimited scalability goals. S3 was the first thing that launched that was designed for unlimited scalability. 

But there were a bunch of things that were branded “AWS” going way back years earlier. Part of why it’s confusing is because we just never cared.

The Modern Guide To Managing Cloud Costs

Traditional cost management is broken. Here's how to fix it.

Modern Cost Management Guide