Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 3 out of 9 pages
Viewing questions 21-30 out of questions
Questions # 21:

An ecommerce company collects daily customer transaction logs in CSV format and stores the logs in Amazon S3. The company uses Amazon Athena to scan a subset of attributes from the logs on the same day the company receives each log.

Query times are increasing because of increasing transaction volume. The company wants to improve query performance.

Which solution will meet these requirements with the SHORTEST query times?

Options:

A.

Convert the CSV logs into multiple ORC files for better parallelism in Athena. Partition by date in Amazon S3. Use columnar pushdown filters.


B.

Convert the CSV logs to JSON. Partition by date in Amazon S3. Use Athena with dynamic filtering to reduce data scans.


C.

Convert the CSV logs to Avro. Partition by date in Amazon S3. Use Athena with projection-based partitioning.


D.

Convert the CSV logs to a single Apache Parquet file for each day. Partition the data by date in Amazon S3. Use Athena with predicate pushdown filters.


Expert Solution
Questions # 22:

During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.

A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials.

Which combination of steps should the data engineer take to meet these requirements? (Choose two.)

Options:

A.

Store the credentials in the AWS Glue job parameters.


B.

Store the credentials in a configuration file that is in an Amazon S3 bucket.


C.

Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.


D.

Store the credentials in AWS Secrets Manager.


E.

Grant the AWS Glue job 1AM role access to the stored credentials.


Expert Solution
Questions # 23:

A data engineer is using an AWS Glue ETL job to remove outdated customer records from a table that contains customer account information. The data engineer is using the following SQL command:

MERGE INTO accounts t USING monthly_accounts_update s

ON t.customer = s.customer

WHEN MATCHED THEN DELETE

What will happen when the data engineer runs the SQL command?

Options:

A.

All customer records that exist in both the customer accounts table and the monthly_accounts_update table will be deleted from the accounts table.


B.

Only customer records that are present in both tables will be retained in the customer accounts table.


C.

The monthly_accounts_update table will be deleted.


D.

No records will be deleted because the command syntax is not valid in AWS Glue.


Expert Solution
Questions # 24:

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.


B.

Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.


C.

Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.


D.

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.


Expert Solution
Questions # 25:

A data engineer needs to deploy a complex pipeline. The stages of the pipeline must run scripts, but only fully managed and serverless services can be used.

Options:

A.

Deploy AWS Glue jobs and workflows. Use AWS Glue to run the jobs and workflows on a schedule.


B.

Use Amazon MWAA to build and schedule the pipeline.


C.

Deploy the script to EC2. Use EventBridge to schedule it.


D.

Use AWS Glue DataBrew and EventBridge to run on a schedule.


Expert Solution
Questions # 26:

A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a physical data model.

The data engineer encounters a de-normalized table that is growing in size. The table does not have a suitable column to use as the distribution key.

Which distribution style should the data engineer use to meet these requirements with the LEAST maintenance overhead?

Options:

A.

ALL distribution


B.

EVEN distribution


C.

AUTO distribution


D.

KEY distribution


Expert Solution
Questions # 27:

A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Write a custom Python application. Host the application on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.


B.

Write a PySpark ETL script. Host the script on an Amazon EMR cluster.


C.

Write an AWS Glue PySpark job. Use Apache Spark to transform the data.


D.

Write an AWS Glue Python shell job. Use pandas to transform the data.


Expert Solution
Questions # 28:

A company is developing an application that runs on Amazon EC2 instances. Currently, the data that the application generates is temporary. However, the company needs to persist the data, even if the EC2 instances are terminated.

A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and configure the instances to preserve the data.

Which solution will meet this requirement?

Options:

A.

Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume that contains the application data. Apply the default settings to the EC2 instances.


B.

Launch new EC2 instances by using an AMI that is backed by a root Amazon Elastic Block Store (Amazon EBS) volume that contains the application data. Apply the default settings to the EC2 instances.


C.

Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances.


D.

Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic Block Store (Amazon EBS) volume. Attach an additional EC2 instance store volume to contain the application data. Apply the default settings to the EC2 instances.


Expert Solution
Questions # 29:

A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.

A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.

Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.


B.

Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.


C.

Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.


D.

Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.


Expert Solution
Questions # 30:

A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.

The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the data, and filter the results by a numeric value of a column.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Use AWS Glue Python jobs to read and transform the CSV files.


B.

Use an AWS Glue custom crawler to read and transform the CSV files.


C.

Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.


D.

Use AWS Glue DataBrew recipes to read and transform the CSV files.


Expert Solution
Viewing page 3 out of 9 pages
Viewing questions 21-30 out of questions