Reserved instances are one of those things that, when you first hear about them, you say, “Wow! I could save a lot of money!” And then you start to try and figure out how many you need? What sizes? Which operating systems? In which regions? Should they be convertible? Should I choose a 1-year or 3-year term? All up-front, partial up-front, or no up-front? How much compute am I actually going to need over that term?
Very quickly you come to realize that this decision is much more involved than you thought at first glance. You gather your engineering teams and have big round-tables to discuss this significant capital investment and plan for the future. Perhaps you ask for estimates of required compute from each of your team leads or use a cost management tool to provide reserved instance recommendations, and finally put together a plan for a purchase. Baked into this planning is the assumption that, if instances go unused in certain hours, they will be picked up during peak hours when users of your platform are most active. Essentially viewing reserved instances (RIs) as a rollover plan, much like the data plans on our cell phones.
With this strategy, organizations will end up purchasing reserved capacity for known production loads and baseline development/test loads. For enterprise-scale organizations, this can amount to a commitment worth millions of dollars, making AWS very happy. But are you?
In fact, most companies we speak with aren’t satisfied with their RI purchases, or they’re unaware of critical facts about how RIs actually work that would make them unhappy.
Buried within all of the options for each RI purchase is a hidden, implicit, constraint that you’ve agreed to. Reserved Instances are use it or lose it!
Now you might be thinking, “Yes… I already knew that.” But what most people don’t realize is that RIs are use it or lose it within a single hour. Take a look at what the AWS documentation has to say about it.
The reason why Amazon has decided not to allow unused hours to rollover into the next is that AWS must be able to deliver compute for every reservation they have on file. If rollover hours were allowed, AWS would permit a user who hasn’t used their reservation all year to spin up 8,760 instances in the last hour of the year (365 days x 24 hours). If every user with reservations, or even a small percentage did something like that, Amazon wouldn’t be able to handle the load.
Most people, even at companies that heavily use reserved capacity, are unaware of this fact. After taking a deeper look into RI usage at an hourly granularity, these companies become aware that there is a significant issue. Instances that were purchased up to 3 years ago are going unused. Their predictions of usage aren’t matching up with reality. To a certain extent, this isn’t the fault of any one person, but more a fault of the process. Because most engineering teams and cost management tools base required capacity recommendations on long-term monthly trends, the amount of RIs purchased ends up being well over what is actually required.
Monthly, weekly, or even daily trends throw away the richness that exists in hourly data. This loss in granularity hides what is actually happening within a cloud deployment. Let’s consider an organization whose activity occurs at the start and close of business every day. In this example, 12 machines are needed at 8:00AM, and another 12 machines are needed at 5:00PM to handle the load for the day. This is a total of 24 compute-hours in the day, which may lead an organization to purchase 1 reserved machine for the year. Unfortunately, because of how their system operates, they will be paying for 22 hours of unused RIs (effectively wasting 92% of their up-front payment). Even worse, they will be paying for 11 on-demand machines at both 8:00AM and 5:00PM.
Not understanding how your infrastructure works and how Amazon bills reserved instances can amount to an immense financial loss. In speaking with corporations who heavily use reserved capacity, we estimate that these losses can be up to 15% of your total AWS spend. For enterprise-scale corporations, this loss can be in the millions.
Do you know if you’re wasting budget on unused reservations? It might be worth taking a look. Consider asking us for help if this isn’t something that is easy for you to do.