Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 5 out of 9 pages
Viewing questions 41-50 out of questions
Questions # 41:

A company stores a large dataset in an Amazon S3 bucket. A data engineer frequently runs complex queries on the dataset by using Amazon Athena. The data engineer needs to optimize query performance and optimize costs for queries that are run multiple times with the same parameters.

Which solution will meet these requirements?

Options:

A.

Convert the dataset to JSON format before running Athena queries.


B.

Use Amazon EMR to pre-process the data before running Athena queries.


C.

Configure query result reuse settings in the Athena workgroup.


D.

Use Amazon Redshift Spectrum to query the data in Amazon S3.


Expert Solution
Questions # 42:

A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes.

Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)

Options:

A.

Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.


B.

Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.


C.

Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.


D.

Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to determine whether the current Athena query has finished running successfully. Configure the Python shell script to invoke the next query when the current query has finished running.


E.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the Athena queries in AWS Batch.


Expert Solution
Questions # 43:

A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded.

A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.

How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table?

Options:

A.

Use a second Lambda function to invoke the first Lambda function based on Amazon CloudWatch events.


B.

Use the Amazon Redshift Data API to publish an event to Amazon EventBridqe. Configure an EventBridge rule to invoke the Lambda function.


C.

Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.


D.

Use a second Lambda function to invoke the first Lambda function based on AWS CloudTrail events.


Expert Solution
Questions # 44:

A data engineer is optimizing query performance in Amazon Athena notebooks that use Apache Spark to analyze large datasets that are stored in Amazon S3. The data is partitioned. An AWS Glue crawler updates the partitions.

The data engineer wants to minimize the amount of data that is scanned to improve efficiency of Athena queries.

Which solution will meet these requirements?

Options:

A.

Apply partition filters in the queries.


B.

Increase the frequency of AWS Glue crawler invocations to update the data catalog more often.


C.

Organize the data that is in Amazon S3 by using a nested directory structure.


D.

Configure Spark to use in-memory caching for frequently accessed data.


Expert Solution
Questions # 45:

A media company wants to build a real-time analytics pipeline to process customer activity events across the company ' s website and mobile app. The company wants to build a solution to ingest millions of events with minimum latency. The solution must be scalable and durable enough so that no data is lost.

Which solution will meet these requirements in the MOST cost-effective way?

Options:

A.

Set up an Amazon Kinesis Data Streams pipeline to ingest data, process the data by using AWS Lambda functions, and store the results in Amazon Redshift for analytics.


B.

Schedule an AWS Glue job to fetch user interaction logs every 10 minutes from Amazon S3. Configure the AWS Glue job to transform and store the data in Amazon Redshift for analytics.


C.

Configure Amazon S3 Event Notifications to invoke an AWS Lambda function to process every new interaction log file. Store the result in Amazon Redshift for analytics.


D.

Deploy an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. Use self-managed consumers to process and distribute data in real time. Integrate with Amazon Redshift for enhanced analytics.


Expert Solution
Questions # 46:

A company wants to migrate a data warehouse from Teradata to Amazon Redshift. Which solution will meet this requirement with the LEAST operational effort?

Options:

A.

Use AWS Database Migration Service (AWS DMS) Schema Conversion to migrate the schema. Use AWS DMS to migrate the data.


B.

Use the AWS Schema Conversion Tool (AWS SCT) to migrate the schema. Use AWS Database Migration Service (AWS DMS) to migrate the data.


C.

Use AWS Database Migration Service (AWS DMS) to migrate the data. Use automatic schema conversion.


D.

Manually export the schema definition from Teradata. Apply the schema to the Amazon Redshift database. Use AWS Database Migration Service (AWS DMS) to migrate the data.


Expert Solution
Questions # 47:

A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to generate analytics.

Which solution will give the company the ability to use Spark to access Athena?

Options:

A.

Athena query settings


B.

Athena workgroup


C.

Athena data source


D.

Athena query editor


Expert Solution
Questions # 48:

A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. The ETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, and load data into Amazon Redshift.

The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python to orchestrate the jobs.

Which service will meet these requirements?

Options:

A.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)


B.

AWS Step Functions


C.

AWS Glue


D.

Amazon EventBridge


Expert Solution
Questions # 49:

A company currently stores all of its data in Amazon S3 by using the S3 Standard storage class.

A data engineer examined data access patterns to identify trends. During the first 6 months, most data files are accessed several times each day. Between 6 months and 2 years, most data files are accessed once or twice each month. After 2 years, data files are accessed only once or twice each year.

The data engineer needs to use an S3 Lifecycle policy to develop new data storage rules. The new storage solution must continue to provide high availability.

Which solution will meet these requirements in the MOST cost-effective way?

Options:

A.

Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.


B.

Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.


C.

Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.


D.

Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.


Expert Solution
Questions # 50:

A company runs a data pipeline that uses AWS Step Functions to orchestrate AWS Lambda functions and AWS Glue jobs. The Lambda functions and AWS Glue jobs require access to multiple Amazon RDS databases. The Lambda functions and AWS Glue jobs already have access to the VPC that hosts the RDS databases.

Which solution will meet these requirements in the MOST secure way?

Options:

A.

Use the root user of the company’s AWS account to create long-term access keys for the RDS databases. Include the access keys programmatically in the Lambda functions and AWS Glue jobs. Generate new keys every 90 days.


B.

Create an IAM role that has permissions to access the RDS databases. Create a second IAM role for the Lambda functions and AWS Glue jobs that has permissions to assume the IAM role that has access permissions for the RDS databases.


C.

Create an IAM user that can assume IAM roles that have permissions and credentials to access the RDS databases. Assign the IAM user to each of the Lambda functions and AWS Glue jobs.


D.

Create Java Database Connectivity (JDBC) connections between the Lambda functions and AWS Glue jobs and the RDS databases. In the connection string, include the necessary credentials.


Expert Solution
Viewing page 5 out of 9 pages
Viewing questions 41-50 out of questions