Skip to content

AWS Data Engineer - Intermediate Quiz

Back to Quiz Home


This quiz covers performance optimization in S3 and Redshift, streaming architectures, and handling schema changes.


#

How do you optimize S3 performance for high request rates (thousands of PUT/GET per second)?

#

What is the "Small File Problem" in Hadoop/Spark/Athena?

#

When should you choose Amazon EMR over AWS Glue?

#

What is "Partitioning" in the context of Athena/S3?

#

Which distribution style in Redshift optimizes for joins between two large tables?

#

How do you handle schema evolution (e.g., adding a new column) in a Parquet-based data lake?

#

What is the main difference between Amazon QuickSight and Tableau on EC2?

#

How do you securely connect QuickSight to a private RDS instance?

#

Which service is used to orchestrate complex data workflows involving dependencies (e.g., Lambda -> Glue -> Redshift)?

#

What is the role of the "Sort Key" in Redshift?

#

If you need to query logs in S3 but only care about records with "ERROR", how can you avoid scanning the whole file?

#

What is key difference between "Stream Processing" and "Batch Processing"?

#

How can you ensure PII data is not stored in your clean data lake?

#

Which Redshift feature allows you to manage concurrent query execution queues?

#

What is the benefit of "Columnar Storage" (like Parquet) over Row-based (like CSV)?

#

How do you monitor the "lag" in a Kinesis Data Stream consumer?

#

Which service would you use to catalog metadata from an on-premise JDBC database?

#

What is a common use case for DynamoDB in a data engineering pipeline?

#

How does Kinesis Data Firehose handle data transformation before loading to S3?

#

What is the purpose of "Lifecycle Policies" in S3 for a Data Lake?

Quiz Progress

0 / 0 questions answered (0%)

0 correct


📚 Study Guides


📬 Weekly DevOps, Cloud & Gen AI quizzes & guides