Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions
Questions # 31:

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Select TWO.)

Options:

A.

Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.


B.

Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.


C.

Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.


D.

Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.


E.

Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.


Expert Solution
Questions # 32:

A data engineer needs to create an empty copy of an existing table in Amazon Athena to perform data processing tasks. The existing table in Athena contains 1,000 rows.

Which query will meet this requirement?

Options:

A.

CREATE TABLE new_table LIKE old_table;


B.

CREATE TABLE new_table AS SELECT * FROM old_table WITH NO DATA;


C.

CREATE TABLE new_table AS SELECT * FROM old_table;


D.

CREATE TABLE new_table AS SELECT * FROM old_table WHERE 1=1;


Expert Solution
Questions # 33:

A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit.

Which solution will meet these requirements?

Options:

A.

Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use.


B.

Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above.


C.

Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2.


D.

Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2.


Expert Solution
Questions # 34:

A company has five offices in different AWS Regions. Each office has its own human resources (HR) department that uses a unique IAM role. The company stores employee records in a data lake that is based on Amazon S3 storage.

A data engineering team needs to limit access to the records. Each HR department should be able to access records for only employees who are within the HR department's Region.

Which combination of steps should the data engineering team take to meet this requirement with the LEAST operational overhead? (Choose two.)

Options:

A.

Use data filters for each Region to register the S3 paths as data locations.


B.

Register the S3 path as an AWS Lake Formation location.


C.

Modify the IAM roles of the HR departments to add a data filter for each department's Region.


D.

Enable fine-grained access control in AWS Lake Formation. Add a data filter for each Region.


E.

Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3 access. Restrict access based on Region.


Expert Solution
Questions # 35:

A transportation company wants to track vehicle movements by capturing geolocation records. The records are 10 bytes in size. The company receives up to 10,000 records every second. Data transmission delays of a few minutes are acceptable because of unreliable network conditions.

The transportation company wants to use Amazon Kinesis Data Streams to ingest the geolocation data. The company needs a reliable mechanism to send data to Kinesis Data Streams. The company needs to maximize the throughput efficiency of the Kinesis shards.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

A.

Kinesis Agent


B.

Kinesis Producer Library (KPL)


C.

Amazon Data Firehose


D.

Kinesis SDK


Expert Solution
Questions # 36:

A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access.

Which solution will meet these requirements with the LEAST effort?

Options:

A.

Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.


B.

Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.


C.

Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.


D.

Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.


Expert Solution
Questions # 37:

A data engineer needs to build an enterprise data catalog based on the company's Amazon S3 buckets and Amazon RDS databases. The data catalog must include storage format metadata for the data in the catalog.

Which solution will meet these requirements with the LEAST effort?

Options:

A.

Use an AWS Glue crawler to scan the S3 buckets and RDS databases and build a data catalog. Use data stewards to inspect the data and update the data catalog with the data format.


B.

Use an AWS Glue crawler to build a data catalog. Use AWS Glue crawler classifiers to recognize the format of data and store the format in the catalog.


C.

Use Amazon Macie to build a data catalog and to identify sensitive data elements. Collect the data format information from Macie.


D.

Use scripts to scan data elements and to assign data classifications based on the format of the data.


Expert Solution
Questions # 38:

A company is building an inventory management system and an inventory reordering system to automatically reorder products. Both systems use Amazon Kinesis Data Streams. The inventory management system uses the Amazon Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the Amazon Kinesis Client Library (KCL) to consume data from the stream. The company configures the stream to scale up and down as needed.

Before the company deploys the systems to production, the company discovers that the inventory reordering system received duplicated data.

Which factors could have caused the reordering system to receive duplicated data? (Select TWO.)

Options:

A.

The producer experienced network-related timeouts.


B.

The stream's value for the IteratorAgeMilliseconds metric was too high.


C.

There was a change in the number of shards, record processors, or both.


D.

The AggregationEnabled configuration property was set to true.


E.

The max_records configuration property was set to a number that was too high.


Expert Solution
Questions # 39:

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.

Which solution will meet this requirement?

Options:

A.

Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.


B.

Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.


C.

Turn on concurrency scaling in the settings during the creation of and new Redshift cluster.


D.

Turn on concurrency scaling for the daily usage quota for the Redshift cluster.


Expert Solution
Questions # 40:

A company has a data lake in Amazon 53. The company uses AWS Glue to catalog data and AWS Glue Studio to implement data extract, transform, and load (ETL) pipelines.

The company needs to ensure that data quality issues are checked every time the pipelines run. A data engineer must enhance the existing pipelines to evaluate data quality rules based on predefined thresholds.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.

Add a new transform that is defined by a SQL query to each Glue ETL job. Use the SQL query to implement a ruleset that includes the data quality rules that need to be evaluated.


B.

Add a new Evaluate Data Quality transform to each Glue ETL job. Use Data Quality Definition Language (DQDL) to implement a ruleset that includes the data quality rules that need to be evaluated.


C.

Add a new custom transform to each Glue ETL job. Use the PyDeequ library to implement a ruleset that includes the data quality rules that need to be evaluated.


D.

Add a new custom transform to each Glue ETL job. Use the Great Expectations library to implement a ruleset that includes the data quality rules that need to be evaluated.


Expert Solution
Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions