- Why Change?
Discover the power of cloud cost intelligence.
Give engineering a cloud cost coach.
Learn more about CloudZero's pricing.
Request a demo to see CloudZero in action.
Learn more about CloudZero and who we are.
Got questions? We have answers.
Speak with our Cloud Cost Analysts and get the answers you need.Get in touch
How SeatGeek Decoded Its AWS Bill and Measured Cost Per CustomerRead customer story
Enable engineering to make cost-aware development decisions.
Give finance the context they need to make informed decisions.
Decentralize cloud cost and mature your FinOps program.
Discover the best cloud cost optimization content in the industry.
Browse helpful webinars, ebooks, and other useful resources.
Learn how we’ve helped happy customers like SeatGeek, Drift, Remitly, and more.
5 Tactical Ways To Align Engineering And Finance On Cloud SpendRead blog post
This complete guide covers what exactly AWS Athena is, what it does, how it runs, how much it costs, and how it compares with AWS Redshift and AWS Glue.
Just a few years ago, Amazon introduced yet another service into its data analytics arsenal. AWS Athena is the name, although its creators prefer to stick with Amazon Athena.
Whatever you choose to call it, there’s no doubt that Athena is already creating ripples in the big data analytics space, alongside the likes of Amazon DynamoDB, and Redshift. There are even claims that it’s not only cheaper than similar services, but also manages to save you the trouble of managing infrastructure.
It’s not all good news, though. AWS Athena also happens to have its fair share of weaknesses, which could substantially influence your overall data analysis.
To help give you a better understanding of the Amazon service, this article dives deep into what exactly AWS Athena is, what it does, how it runs, how much it costs, and how it compares with AWS Redshift and AWS Glue.
Table Of Contents
AWS Athena is best described as an interactive query service that’s capable of seamlessly using standard Structured Query Language (SQL) to conduct analysis of data stored in Amazon Simple Storage Service (Amazon S3).
This system was introduced to simplify the whole process of analyzing Amazon S3 data. To start, open your AWS Management Console, direct Amazon Athena towards your Amazon S3 data, and then launch standard SQL queries. You’ll be able to retrieve the query results in a couple of seconds.
AWS Athena is also serverless and built to scale automatically. The fact that Athena is serverless means you won’t be required to set up or manage any infrastructure. With auto scaling, even when you’re dealing with complex queries and large data sets, you can count on it to execute your queries in parallel and quickly generate the results.
This architecture allows Amazon to charge Athena users for only the queries they run, consequently making the service a conveniently cost-effective option for organizations leveraging Amazon S3.
There are many factors that come into play when comparing AWS Athena to Redshift. But, overall, Amazon Athena shines in terms of cost and portability, while Redshift triumphs when it comes to scale and performance.
What does this mean?
Well, AWS Athena is a serverless service that doesn’t require any additional infrastructure to scale, manage, and build data sets. It runs directly over Amazon S3 data sets as a read-only service, setting up external tables without manipulating the S3 data sources.
Amazon Redshift, on the other hand, is a petabyte-scale data warehouse service that’s based on PostgreSQL. The queries here don’t just run directly. Instead, Redshift relies on clusters, for which you’ll be required to bring in the data extracts and create tables before proceeding with your query.
As such, you could say that AWS Athena is best reserved for instances when you need to use Presto and ANSI SQL to launch ad-hoc queries on Amazon S3 data sets. It should be able to work on structured, semi-structured, and unstructured data formats.
Then AWS Redshift, contrastingly, is ideal for analyzing large structured data sets — as it’s capable of generating results much faster than Athena. This means you can, for instance, apply it in real-time data analysis, clickstream events, and log analysis.
Keep in mind, though, that Redshift is costlier since it charges for both compute and storage.
Since its initial release in August 2017, AWS Glue has been operating as a fully-managed Extract, Transform, and Load (ETL) service. It comes with three primary components:
With these tools, AWS Glue helps you in discovering data sets, as well as transforming and preparing them for search and querying.
So, you should be able to use AWS Athena along with AWS Glue. The latter’s Data Catalogue will create, store, and retrieve table metadata (or schema) to be queried by Athena.
AWS Athena, as it turns out, is a double-edged sword. The features that make it conveniently cheap and accessible are the same ones that might limit you to some extent.
How? Here are both sides of the story:
As we’ve stated already, AWS Athena follows a pricing schedule that charges you based on the queries you choose to run in your data analysis.
Please note, however, that we’re not talking about the number of queries. Rather, your usage bill is determined by the amount of data scanned in your querying. Amazon calculates the number of bytes and then rounds them off to the nearest megabyte — with 10MB being the minimum volume per query.
All in all, you should expect to pay $5 for every terabyte (TB) of data that you scan. In the meantime, you won’t be charged for failed queries, statements for managing partitions, as well as Data Definition Language (DDL) statements.
But, that’s not all. Amazon further makes it possible for you to reduce the pre-query costs by 30% to 90%. You just need to partition, compress, or convert your data into columnar formats.
Although AWS Athena has proven to be favorably priced, the kicker is, its billing process isn’t very straightforward. Amazon will tell you the funds you’re using — but it’s difficult to see how and why.
While that may be okay for a one-time or short-term AWS user, the stakes rise when it comes to long-term use. If you intend to adopt the service for the long haul, you need a proper cloud cost management platform.
That’s why high-performing engineering teams choose CloudZero. Our platform uses machine learning to analyze all your service parameters and, subsequently, generates accurate cloud cost intelligence.
You get to understand your cloud costs in terms of what you’re spending on, how your engineering activities are impacting the costs, and the unit costs per customer/product — providing you with a full picture of your AWS spend. to see CloudZero in action.