Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 1 out of 8 pages
Viewing questions 1-10 out of questions
Questions # 1:

A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.

The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.

The company needs to improve the performance of the second pipeline.

Which solution will meet this requirement MOST cost-effectively?

Options:

A.

Use a larger worker type.


B.

Increase the number of workers in the AWS Glue ETL jobs.


C.

Use the AWS Glue DynamicFrame grouping option.


D.

Enable AWS Glue auto scaling.


Questions # 2:

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.

Which solution will run the Glue jobs in the MOST cost-effective way?

Options:

A.

Choose the FLEX execution class in the Glue job properties.


B.

Use the Spot Instance type in Glue job properties.


C.

Choose the STANDARD execution class in the Glue job properties.


D.

Choose the latest version in the GlueVersion field in the Glue job properties.


Questions # 3:

A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account. A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow. Which log type should the data engineer use to diagnose the cause of the failure?

Options:

A.

YourEnvironmentName-WebServer


B.

YourEnvironmentName-Scheduler


C.

YourEnvironmentName-DAGProcessing


D.

YourEnvironmentName-Task


Questions # 4:

A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.

The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs.

Which solutions will meet these requirements? (Select TWO.)

Options:

A.

Use Amazon Redshift ML to generate inventory recommendations.


B.

Use SQL to invoke a remote SageMaker endpoint for prediction.


C.

Use Amazon Redshift ML to schedule regular data exports for offline model training.


D.

Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.


E.

Use Amazon Redshift as a file storage system to archive old inventory management reports.


Questions # 5:

A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The company also needs to migrate a data catalog into a persistent storage solution.

The company currently stores the data catalog in an on-premises Apache Hive metastore on the Hadoop clusters. The company requires a serverless solution to migrate the data catalog.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Use AWS Database Migration Service (AWS DMS) to migrate the Hive metastore into Amazon S3. Configure AWS Glue Data Catalog to scan Amazon S3 to produce the data catalog.


B.

Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's data catalog as an external data catalog.


C.

Configure an external Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use Amazon Aurora MySQL to store the company's data catalog.


D.

Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use the new metastore as the company's data catalog.


Questions # 6:

A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to generate analytics.

Which solution will give the company the ability to use Spark to access Athena?

Options:

A.

Athena query settings


B.

Athena workgroup


C.

Athena data source


D.

Athena query editor


Questions # 7:

A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The job contains a sort-merge join of two large and equally sized DataFrames.

The job is failing with the following error: No space left on device.

Which solution will resolve the error?

Options:

A.

Use the AWS Glue Spark shuffle manager.


B.

Deploy an Amazon Elastic Block Store (Amazon EBS) volume for the job to use.


C.

Convert the sort-merge join in the job to be a broadcast join.


D.

Convert the DataFrames to DynamicFrames, and perform a DynamicFrame join in the job.


Questions # 8:

A manufacturing company collects sensor data from its factory floor to monitor and enhance operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon S3 bucket.

The company needs to display a real-time view of operational efficiency on a large screen in the manufacturing facility.

Which solution will meet these requirements with the LOWEST latency?

Options:

A.

Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to process the sensor data. Use a connector for Apache Flink to write data to an Amazon Timestream database. Use the Timestream database as a source to create a Grafana dashboard.


B.

Configure the S3 bucket to send a notification to an AWS Lambda function when any new object is created. Use the Lambda function to publish the data to Amazon Aurora. Use Aurora as a source to create an Amazon QuickSight dashboard.


C.

Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to process the sensor data. Create a new Data Firehose delivery stream to publish data directly to an Amazon Timestream database. Use the Timestream database as a source to create an Amazon QuickSight dashboard.


D.

Use AWS Glue bookmarks to read sensor data from the S3 bucket in real time. Publish the data to an Amazon Timestream database. Use the Timestream database as a source to create a Grafana dashboard.


Questions # 9:

An ecommerce company stores sales data in an AWS Glue table named sales_data. The company stores the sales_data table in an Amazon S3 Standard bucket. The table contains columns named order_id, customer_id, product_id, order_date, shipping_date, and order_amount.

The company wants to improve query performance by partitioning the sales_data table by order_date. The company needs to add the partition to the existing sales_data table in AWS Glue.

Which solution will meet these requirements?

Options:

A.

Update the AWS Glue table’s schema to include the new partition.


B.

Edit the AWS Glue table’s metadata file directly in Amazon S3.


C.

Use the AWS Glue Data Catalog API to add the new partition to the table.


D.

Manually modify the S3 bucket to use the new partition.


Questions # 10:

A data engineer is designing a new data lake architecture for a company. The data engineer plans to use Apache Iceberg tables and AWS Glue Data Catalog to achieve fast query performance and enhanced metadata handling. The data engineer needs to query historical data for trend analysis and optimize storage costs for a large volume of event data.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Store Iceberg table data files in Amazon S3 Intelligent-Tiering.


B.

Define partitioning schemes based on event type and event date.


C.

Use AWS Glue Data Catalog to automatically optimize Iceberg storage.


D.

Run a custom AWS Glue job to compact Iceberg table data files.


Viewing page 1 out of 8 pages
Viewing questions 1-10 out of questions