https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
AWS Glue is a serverless, fully managed ETL (Extract, Transform, and Load) service. It is specifically designed for data transformation tasks such as converting .csv files to Apache Parquet format. Using AWS Glue requires minimal development effort because it includes prebuilt transformations and integrates seamlessly with Amazon S3.
Option A:While Amazon EMR with Apache Spark offers extensive flexibility, it requires setting up and managing a cluster, writing custom Spark code, and handling resource scaling, which increases development effort compared to AWS Glue.
Option C:AWS Batch requires creating job definitions, specifying execution environments, and potentially writing custom scripts for the transformation process, which involves more setup compared to AWS Glue.
Option D:AWS Lambda could handle the transformation but is better suited for smaller-scale processing or real-time transformations. Handling hundreds of files daily with Lambda would require more complex orchestration and is not the most efficient solution for this scale of batch processing.
AWS Documentation References:
AWS Glue Overview
Transforming Data Using AWS Glue
Submit