Database costs are a fact of doing business for most cloud-based companies. But unlike a traditional database, which runs all the time and therefore charges you a fairly flat rate for continuous service, Snowflake operates on a different pricing model.
Snowflake allows you to set up multiple database warehouses that store your raw data. You can access these warehouses on demand, whenever you execute a query. Higher level warehouses charge more per query, but queries also run faster than in lower level warehouses. And if you aren’t running any queries at all, you aren’t charged for the downtime.
This pricing model opens the door for several interesting cost optimization opportunities, if you’re willing to take the time to investigate why and how things are working behind the scenes.
The Snowflake cost optimization strategies below range from easy and simple changes to long-term, powerful methods of understanding how each and every query affects your monthly usage bill.
For best results, start at the top and work your way down. By the time you’ve implemented every strategy on this list, you’ll be in complete command of your monthly Snowflake storage costs and usage level, and you’ll know how to keep your spending within a comfortable range.
Keep Your Bill Under Control With These Snowflake Cost Optimization Tips
1. Use multiple warehouses
A lot of people set up a Snowflake warehouse to whatever size they think they’ll need most frequently, and then run every single query within that one warehouse. This tactic technically works, so many people continue on like this forever.
However, Snowflake breaks your costs down by warehouse level. So if you run every single query in one warehouse, everything you do gets lumped together into one big sum on your monthly bill.
Simply by breaking your queries out into multiple warehouses, you can start to see how the costs of those warehouses vary, and make some judgments as to why each warehouse costs more or less than others.
2. Size your warehouses appropriately
As we mentioned, costs also vary by the size of the warehouse. Because large warehouses cost more per query than small warehouses, many people mistakenly assume that smaller warehouses are always going to be cheaper.
Interestingly, costs don’t always scale that way. Because you pay for the time each warehouse is in use, the larger and quicker warehouses can sometimes cost equal to or less than smaller warehouses, simply because they execute queries much faster. The results depend on the query, however, so nailing down which sizes are right for you may require some experimentation.
For instance, you might want to do some simple A/B testing. How much does Snowflake cost for a particular query on an extra large versus an extra small warehouse?
Fine-tune from there to find the option that best meets your needs with a balance of cost and efficiency.
3. Use Snowflake’s auto-suspend feature
This is something simple you can modify in your Snowflake settings, but again, it’s one thing most people never think to check.
When you run a query, the warehouse will wake up, execute your query, and then stay running for a while to see whether you need to do any more work. If you don’t have your warehouses set to automatically suspend, they’ll keep running, and you’ll keep paying for the time even if you’re not actively using it.
Play around to find the time frame that works best for you, but we at CloudZero have found that lowering the default from five minutes down to one minute makes a big difference in our Snowflake costs.
4. Connect to CloudZero
When you’re striving for cost optimization, there’s only so far you can go before you need a unit cost allocation and analysis platform like CloudZero.
Simply by letting CloudZero ingest and display your Snowflake storage and usage costs, you’ll gain a better understanding of your bill. For instance, our dashboard breaks down your Snowflake costs per warehouse and displays them next to your AWS costs, so you can compare easily.
5. Embed metadata into each query
This is another tip we use at CloudZero to gain visibility into our own Snowflake costs.
Snowflake breaks down costs by warehouse, which is helpful. But what happens when you (like most users) run several different types of operations within each warehouse? You need a better way to see how each of those different types of actions contributes to your total cost.
The way we’ve solved this problem is by embedding a long comment block with identifying information into each query we run. We’ve also built a program that takes that comment and parses it into JSON usable by our own CloudZero API, which we cover in the next step.
Don’t be so intimidated by the task of embedding this data that you don’t even try. It takes some effort up front, but paired with CloudZero’s telemetry features, this strategy will catapult you much closer to your ultimate goal of high-definition cost visibility.
6. Input your telemetry data into CloudZero Dimensions
This is where the magic happens.
Let’s say you’ve tracked metadata for each query and now you have a bunch of data sitting in your query history. You could look at the metadata manually, but there is a better way. CloudZero Dimensions ingests all your gathered data, analyzes it, and displays it in clear, convenient charts and graphs.
Whereas before you may have seen that a particular warehouse was costing you $X per month, now you can break that total cost down further. Track your costs per query, per customer, per type of query within each customer category, or whatever metrics you choose to track. Then, compare them against each other to see where you could make some tweaks and improve your costs or efficiency.
Let’s say, for example, that one of your warehouses cost $22,000 this month. If that’s higher than normal, you’re probably wondering what could have contributed to the increase. You’ve just begun using a new type of query this month, so this is the obvious culprit in your mind.
Thankfully, you tracked metadata for each type of query and each customer, so you can use Dimensions to break your total down into unit costs by category.
Looking closer, you’re surprised to learn that the new query cost a total of $1,200 this month. At about 5.45% of $22,000, you know this query is responsible for some, but probably not all of the jump in price.
Then you switch to the graph that displays costs per customer and see that one customer is responsible for a huge chunk of this month’s database queries.
Because queries for this customer are typically low, you’ve had them assigned to a small Snowflake warehouse assuming that will be sufficient. But the spike in demand has caused that warehouse to need to run almost constantly, rarely if ever making it to the point of automatically turning off.
This knowledge allows you to investigate why the change in usage happened and fine-tune your approach toward this customer. Without this detailed breakdown into unit costs, you’d be stuck simply paying your enormous bill and having no idea why it happened in the first place, or how to prevent it from happening again.
It’s worth noting that we’ve used costs per query and per customer as examples, because these are metrics that frequently help our customers. The metrics you decide to track, and the depth and granularity with which you break them down, are totally customizable to your needs.
Don’t Let Snowflake Costs Snowball Out Of Control — Understand Your Costs With CloudZero
Some of the simpler changes we recommend can be done just by modifying your settings and doing some quick comparisons. At a basic level, these strategies can certainly make a difference to your monthly Snowflake costs.
However, if you want to achieve more depth in your understanding of Snowflake spending and make real, noticeable improvements, you’ll need to use the right tool for the job.
For sake of clarity, we’ve pointed out in the above sections when you can make adjustments on your own and when you’ll need CloudZero to progress any further. Are you ready to dive deep and strive for a whole new level of optimization? to see the full power of CloudZero for yourself!