<img height="1" width="1" style="display:none;" alt="LinkedIn" src="https://px.ads.linkedin.com/collect/?pid=1310905&amp;fmt=gif">

What Is AWS Athena And When Should You Use It?

|June 4, 2021|

Receive a free cost architecture review. Sign up for this exclusive offer and  you'll receive a thorough review of your AWS architecture and AWS bill with  recommendations for how you can build more efficient systems.Click here to  learn more <https://www.cloudzero.com/cost-architecture-review>.

Just a few years ago, Amazon introduced yet another service into its data analytics arsenal. AWS Athena is the name, although its creators prefer to stick with Amazon Athena.

Whatever you choose to call it, there’s no doubt that Athena is already creating ripples in the big data analytics space, alongside the likes of Amazon DynamoDB, and Redshift. There are even claims that it’s not only cheaper than similar services, but also manages to save you the trouble of managing infrastructure. 

It’s not all good news, though. AWS Athena also happens to have its fair share of weaknesses, which could substantially influence your overall data analysis. 

To help give you a better understanding of the Amazon service, this article dives deep into what exactly AWS Athena is, what it does, how it runs, how much it costs, and how it compares with AWS Redshift and AWS Glue. 

Table Of Contents

What Is AWS Athena And When Would You Use It?

AWS Athena is best described as an interactive query service that’s capable of seamlessly using standard Structured Query Language (SQL) to conduct analysis of data stored in Amazon Simple Storage Service (Amazon S3). 

This system was introduced to simplify the whole process of analyzing Amazon S3 data. To start, open your AWS Management Console, direct Amazon Athena towards your Amazon S3 data, and then launch standard SQL queries. You’ll be able to retrieve the query results in a couple of seconds. 

AWS Athena is also serverless and built to scale automatically. The fact that Athena is serverless means you won’t be required to set up or manage any infrastructure. With auto scaling, even when you’re dealing with complex queries and large data sets, you can count on it to execute your queries in parallel and quickly generate the results.  

This architecture allows Amazon to charge Athena users for only the queries they run, consequently making the service a conveniently cost-effective option for organizations leveraging Amazon S3. 

How Does Athena Compare To AWS Redshift And AWS Glue?

AWS Athena vs. AWS Redshift

There are many factors that come into play when comparing AWS Athena to Redshift. But, overall, Amazon Athena shines in terms of cost and portability, while Redshift triumphs when it comes to scale and performance. 

What does this mean? 

Well, AWS Athena is a serverless service that doesn’t require any additional infrastructure to scale, manage, and build data sets. It runs directly over Amazon S3 data sets as a read-only service, setting up external tables without manipulating the S3 data sources. 

Amazon Redshift, on the other hand, is a petabyte-scale data warehouse service that’s based on PostgreSQL. The queries here don’t just run directly. Instead, Redshift relies on clusters, for which you’ll be required to bring in the data extracts and create tables before proceeding with your query. 

As such, you could say that AWS Athena is best reserved for instances when you need to use Presto and ANSI SQL to launch ad-hoc queries on Amazon S3 data sets. It should be able to work on structured, semi-structured, and unstructured data formats. 

Then AWS Redshift, contrastingly, is ideal for analyzing large structured data sets — as it’s capable of generating results much faster than Athena. This means you can, for instance, apply it in real-time data analysis, clickstream events, and log analysis. 

Keep in mind, though, that Redshift is costlier since it charges for both compute and storage. 

AWS Athena vs. AWS Glue

Since its initial release in August 2017, AWS Glue has been operating as a fully-managed Extract, Transform, and Load (ETL) service. It comes with three primary components:

  1. A flexible scheduler for handling job monitoring
  2. An ETL engine that’s capable of generating Scala or Python code
  3. A data catalog that acts as the central metadata repository 

With these tools, AWS Glue helps you in discovering data sets, as well as transforming and preparing them for search and querying. 

So, you should be able to use AWS Athena along with AWS Glue. The latter’s Data Catalogue will create, store, and retrieve table metadata (or schema) to be queried by Athena. 

What Are The Benefits And Disadvantages Of Using AWS Athena?

AWS Athena, as it turns out, is a double-edged sword. The features that make it conveniently cheap and accessible are the same ones that might limit you to some extent. 

How? Here are both sides of the story: 

Pros of AWS Athena

  • Serverless: Since it’s distributed as a fully-managed serverless service, AWS Athena saves you all the trouble that comes with infrastructure management. You don’t have to worry about setting up clusters, regulating capacity, or loading data. 
  • Cost-effective: AWS Athena is not only cost-effective but also considerably cheaper than its close competitors. The reason is, the service doesn’t charge you for compute instances. Instead, you only pay for the queries you’re running. 
  • Widely accessible: As a service that runs its queries using standard SQL, AWS Athena is widely accessible to anyone - not just developers and engineers. Even business analysts and other data professionals can adopt it, as standard SQL queries are very simple and straightforward. 
  • Flexibility: Amazon Athena’s open and versatile architecture doesn’t restrict you to a specific vendor, technology, or tool. You can, for example, work with a wide range of open-source file formats, as well as switch freely between query engines without adjusting the schema. 

Cons of AWS Athena

  • No data optimization: AWS Athena doesn’t offer a lot of optimization capabilities. The farthest you can go here is optimizing the queries - not the underlying data. Even when you try to transform the Amazon S3 data using AWS Glue, you still have to be cautious not to disadvantage other services that are accessing the same data. 
  • Shared resources: According to Amazon’s Service Level Agreement (SLA), all AWS Athena users across the globe share the same resources when running their queries. This multi-tenancy approach might trigger resource strain from time to time, which could lead to fluctuating query performance.
  • Lacks data manipulation operations: Since AWS Athena is just a query service, all you’ll find here is a query engine. It doesn’t come with a built-in Data Manipulation Language (DML) interface for inserting, deleting, and updating data. 
  • Requires data partitioning: If you intend to run your SQL queries efficiently, you might want to partition the data sets stored in Amazon S3. The number of partitions that you manage to create will substantially affect the speed and performance of your queries. For instance, every 500 partitions scanned will bump up your querying time by a second. 
  • Lacks indices: While indexing has always been a built-in provision in traditional databases, you don’t get this privilege with AWS Athena. As such, you should expect challenges in operations like consolidating large tables.

How Is AWS Athena Priced?

As we’ve stated already, AWS Athena follows a pricing schedule that charges you based on the queries you choose to run in your data analysis. 

Please note, however, that we’re not talking about the number of queries. Rather, your usage bill is determined by the amount of data scanned in your querying. Amazon calculates the number of bytes and then rounds them off to the nearest megabyte — with 10MB being the minimum volume per query. 

All in all, you should expect to pay $5 for every terabyte (TB) of data that you scan. In the meantime, you won’t be charged for failed queries, statements for managing partitions, as well as Data Definition Language (DDL) statements. 

But, that’s not all. Amazon further makes it possible for you to reduce the pre-query costs by 30% to 90%. You just need to partition, compress, or convert your data into columnar formats. 

Optimizing Costs On AWS Athena

Although AWS Athena has proven to be favorably priced, the kicker is, its billing process isn’t very straightforward. Amazon will tell you the funds you’re using — but it’s difficult to see how and why. 

While that may be okay for a one-time or short-term AWS user, the stakes rise when it comes to long-term use. If you intend to adopt the service for the long haul, you need a proper cloud cost management platform. 

That’s why high-performing engineering teams choose CloudZero. Our platform uses machine learning to analyze all your service parameters and, subsequently, generates accurate cloud cost intelligence. 

You get to understand your cloud costs in terms of what you’re spending on, how your engineering activities are impacting the costs, and the unit costs per customer/product — providing you with a full picture of your AWS spend. To see how it works in action, schedule a demo here!

See CloudZero in Action

STAY IN THE LOOP


Join thousands of engineers who already receive the best AWS and cloud cost intelligence content.