AWS Data Engineer - Basics Quiz
← Back to Quiz Home
This quiz covers fundamental data engineering concepts on AWS, including S3 storage, Glue ETL components, and Redshift basics.
What is a "Data Lake" on AWS typically built upon?
S3 provides the scalable, durable, cost-effective storage foundation for a Data Lake.
Which AWS service is a serverless ETL (Extract, Transform, Load) service?
Glue classifies, cleans, enriches, and moves data reliably between data stores.
What is the purpose of the AWS Glue Data Catalog?
The Data Catalog is a persistent metadata store that integrates with Athena, Redshift, and EMR.
Which service allows you to run standard SQL queries directly against data in S3 without loading it?
Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL.
Redshift is optimized for OLAP (Online Analytical Processing) workloads.
Which file format is column-oriented and optimized for analytics queries?
Parquet stores data by column, making it faster and cheaper to query subsets of columns compared to row-based formats like CSV.
What is an AWS Glue Crawler used for?
Crawlers browse your data sources, deduce the schema, and create table definitions in the Glue Data Catalog.
Which service is best suited for real-time streaming data ingestion?
Kinesis Data Streams creates a channel for capturing and storing terabytes of data per hour from hundreds of thousands of sources.
What is the difference between Kinesis Data Streams and Kinesis Data Firehose?
Firehose is the "easiest way to load streaming data" into data stores.
How is Amazon Athena priced?
You pay only for the queries that you run. Compressing and partitioning data reduces costs significantly.
Which Redshift distribution style distributes rows in a round-robin fashion?
EVEN distribution is useful when the table does not participate in joins or when there is no clear choice for a distribution key.
What is Redshift Spectrum?
Spectrum extends analytics to your data lake, allowing you to query open file formats in S3 using Redshift SQL.
What is Amazon EMR (Elastic MapReduce)?
EMR provides a managed environment for processing vast amounts of data using open-source tools.
Which component is responsible for organizing and scheduling Glue ETL jobs?
Triggers can start jobs on a schedule or based on events (like a previous job finishing).
What helps you visualize and analyze data using interactive dashboards?
QuickSight is AWS’s fast, cloud-powered business intelligence service.
Which S3 feature helps unauthorized users from accessing your data lake?
Securing the bucket permissions is the first line of defense for a Data Lake.
What is an "OLAP" workload?
OLAP queries often involve aggregations and joins over millions of rows (e.g., "Total sales by region last year").
Which Redshift command reclaims space from deleted rows and resorts tables?
Because Redshift does not reclaim space immediately on delete, you must run VACUUM periodically (or rely on auto-vacuum).
How do you compress data in S3 to save Athena query costs?
Athena scans fewer bytes if the data is compressed, directly lowering your bill.
What is the "Lake House" architecture?
Migration of data between the lake and the warehouse is seamless in this architecture.
Quiz Progress
0 / 0 questions answered
(0%)
0 correct
Quiz Complete!
0%
📚 Study Guides
📬 Weekly DevOps, Cloud & Gen AI quizzes & guides