New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 3 out of 7 pages
Viewing questions 21-30 out of questions
Questions # 21:

A company uses Amazon Redshift as its data warehouse. Data encoding is applied to the existing tables of the data warehouse. A data engineer discovers that the compression encoding applied to some of the tables is not the best fit for the data.

The data engineer needs to improve the data encoding for the tables that have sub-optimal encoding.

Which solution will meet this requirement?

Options:

A.

Run the ANALYZE command against the identified tables. Manually update the compression encoding of columns based on the output of the command.


B.

Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command.


C.

Run the VACUUM REINDEX command against the identified tables.


D.

Run the VACUUM RECLUSTER command against the identified tables.


Expert Solution
Questions # 22:

A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.

The company is running a new Oracle database as a source system in the company's data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.

Which solution will meet this requirement with the LEAST effort?

Options:

A.

Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.


B.

Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.


C.

Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.


D.

Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.


Expert Solution
Questions # 23:

An airline company is collecting metrics about flight activities for analytics. The company is conducting a proof of concept (POC) test to show how analytics can provide insights that the company can use to increase on-time departures.

The POC test uses objects in Amazon S3 that contain the metrics in .csv format. The POC test uses Amazon Athena to query the data. The data is partitioned in the S3 bucket by date.

As the amount of data increases, the company wants to optimize the storage solution to improve query performance.

Which combination of solutions will meet these requirements? (Choose two.)

Options:

A.

Add a randomized string to the beginning of the keys in Amazon S3 to get more throughput across partitions.


B.

Use an S3 bucket that is in the same account that uses Athena to query the data.


C.

Use an S3 bucket that is in the same AWS Region where the company runs Athena queries.


D.

Preprocess the .csv data to JSON format by fetching only the document keys that the query requires.


E.

Preprocess the .csv data to Apache Parquet format by fetching only the data blocks that are needed for predicates.


Expert Solution
Questions # 24:

A company uploads .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.

If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.

Which solution will meet these requirements?

Options:

A.

Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.


B.

Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.


C.

Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.


D.

Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.


Expert Solution
Questions # 25:

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company's existing analytics platform.

The company wants to minimize the effort and time required to incorporate third-party datasets.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use API calls to access and integrate third-party datasets from AWS Data Exchange.


B.

Use API calls to access and integrate third-party datasets from AWS


C.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.


D.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).


Expert Solution
Questions # 26:

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.

Which solution will meet this requirement?

Options:

A.

Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.


B.

Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.


C.

Turn on concurrency scaling in the settings during the creation of and new Redshift cluster.


D.

Turn on concurrency scaling for the daily usage quota for the Redshift cluster.


Expert Solution
Questions # 27:

A data engineer uses Amazon Redshift to run resource-intensive analytics processes once every month. Every month, the data engineer creates a new Redshift provisioned cluster. The data engineer deletes the Redshift provisioned cluster after the analytics processes are complete every month. Before the data engineer deletes the cluster each month, the data engineer unloads backup data from the cluster to an Amazon S3 bucket.

The data engineer needs a solution to run the monthly analytics processes that does not require the data engineer to manage the infrastructure manually.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use Amazon Step Functions to pause the Redshift cluster when the analytics processes are complete and to resume the cluster to run new processes every month.


B.

Use Amazon Redshift Serverless to automatically process the analytics workload.


C.

Use the AWS CLI to automatically process the analytics workload.


D.

Use AWS CloudFormation templates to automatically process the analytics workload.


Expert Solution
Questions # 28:

Two developers are working on separate application releases. The developers have created feature branches named Branch A and Branch B by using a GitHub repository's master branch as the source.

The developer for Branch A deployed code to the production system. The code for Branch B will merge into a master branch in the following week's scheduled application release.

Which command should the developer for Branch B run before the developer raises a pull request to the master branch?

Options:

A.

git diff branchB mastergit commit -m


B.

git pull master


C.

git rebase master


D.

git fetch -b master


Expert Solution
Questions # 29:

A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline.

Which AWS service or feature will meet these requirements MOST cost-effectively?

Options:

A.

AWS Step Functions


B.

AWS Glue workflows


C.

AWS Glue Studio


D.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)


Expert Solution
Questions # 30:

A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. After a merger, a query joining two large sales tables becomes slow. Table S1 has 10 billion records, Table S2 has 900 million records.

The query performance must improve.

Options:

A.

Use the KEY distribution style for both sales tables. Select a low cardinality column to use for the join.


B.

Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join.


C.

Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.


D.

Use the Amazon Redshift query optimizer to review and select optimizations to implement.


E.

Use Amazon Redshift Advisor to review and select optimizations to implement.


Expert Solution
Viewing page 3 out of 7 pages
Viewing questions 21-30 out of questions