Tuesday, November 21, 2017

Query Flat Files in S3 with Amazon Athena

Amazon Athena enables you to access data present in flat files stored in S3 (Simple Storage Service) as if it were in a table in the database. That and you don't have to set up any server or any other software to accomplish that.

That's another glowing example of being 'Serverless.'


So if a telecommunication has hundreds of thousands or more call detail record file in CSV or Apache Parquet or any other supported format, it can just be uploaded to S3 bucket, and then by using AWS Athena, that CDR data can be queried using well known ANSI SQL.

Ease of use, performance, and cost savings are few of the benefits of AWS Athena service. True to the Cloud promise, with Athena you are charged for what you actually do; i.e. you are only charged for the queries. You are charged $5 per terabyte scanned by your queries. Beyond S3 there are no additional storage costs.

So if you have huge amount of formatted data in files and all you want to do is to query that data using familiar ANSI SQL then AWS Athena is the way to go. Beware that Athena is not for enterprise reporting and business intelligence. For that purpose we have AWS Redshift. Athena is also not for running highly distributed processing frameworks such as Hadoop. For that purpose we have AWS EMR. Athena is more suitable for running interactive queries on your supported formatted data in S3.

Remember to keep reading the AWS Athena documentation as it will keep improving, lifting limitations, and changing like everything else in the cloud.

No comments: