Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 7 out of 9 pages
Viewing questions 61-70 out of questions
Questions # 61:

A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. After a merger, a query joining two large sales tables becomes slow. Table S1 has 10 billion records, Table S2 has 900 million records.

The query performance must improve.

Options:

A.

Use the KEY distribution style for both sales tables. Select a low cardinality column to use for the join.


B.

Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join.


C.

Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.


D.

Use the Amazon Redshift query optimizer to review and select optimizations to implement.


E.

Use Amazon Redshift Advisor to review and select optimizations to implement.


Expert Solution
Questions # 62:

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.

A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.

The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.

Which solution will meet these requirements?

Options:

A.

Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.


B.

Change the distribution key to the table column that has the largest dimension.


C.

Upgrade the reserved node from ra3.4xlarqe to ra3.16xlarqe.


D.

Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.


Expert Solution
Questions # 63:

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Establish WebSocket connections to Amazon Redshift.


B.

Use the Amazon Redshift Data API.


C.

Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.


D.

Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.


Expert Solution
Questions # 64:

A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit.

Which solution will meet these requirements?

Options:

A.

Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use.


B.

Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above.


C.

Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2.


D.

Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2.


Expert Solution
Questions # 65:

A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.

The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use the AWS Glue API to manually update the Data Catalog.


B.

Run an MSCK REPAIR TABLE command in Athena.


C.

Schedule an AWS Glue crawler to periodically update the Data Catalog.


D.

Run a REFRESH TABLE command in Athena.


Expert Solution
Questions # 66:

A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models.

The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio.

Which change should the engineer make to gain access to SageMaker Studio?

Options:

A.

Add the AWSGlueServiceRole managed policy to the data engineer ' s IAM user.


B.

Add a policy to the data engineer ' s IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.


C.

Add the AmazonSageMakerFullAccess managed policy to the data engineer ' s IAM user.


D.

Add a policy to the data engineer ' s IAM user that allows the sts:AddAssociation action for the AWS Glue and SageMaker service principals in the trust policy.


Expert Solution
Questions # 67:

A retail company stores customer data in an Amazon S3 bucket. Some of the customer data contains personally identifiable information (PII) about customers. The company must not share PII data with business partners.

A data engineer must determine whether a dataset contains PII before making objects in the dataset available to business partners.

Which solution will meet this requirement with the LEAST manual intervention?

Options:

A.

Configure the S3 bucket and S3 objects to allow access to Amazon Macie. Use automated sensitive data discovery in Macie.


B.

Configure AWS CloudTrail to monitor S3 PUT operations. Inspect the CloudTrail trails to identify operations that save PII.


C.

Create an AWS Lambda function to identify PII in S3 objects. Schedule the function to run periodically.


D.

Create a table in AWS Glue Data Catalog. Write custom SQL queries to identify PII in the table. Use Amazon Athena to run the queries.


Expert Solution
Questions # 68:

A company uses Amazon S3 to store data and Amazon QuickSight to create visualizations.

The company has an S3 bucket in an AWS account named Hub-Account. The S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The company ' s QuickSight instance is in a separate account named BI-Account

The company updates the S3 bucket policy to grant access to the QuickSight service role. The company wants to enable cross-account access to allow QuickSight to interact with the S3 bucket.

Which combination of steps will meet this requirement? (Select TWO.)

Options:

A.

Use the existing AWS KMS key to encrypt connections from QuickSight to the S3 bucket.


B.

Add the 53 bucket as a resource that the QuickSight service role can access.


C.

Use AWS Resource Access Manager (AWS RAM) to share the S3 bucket with the Bl-Account account.


D.

Add an IAM policy to the QuickSight service role to give QuickSight access to the KMS key that encrypts the S3 bucket.


E.

Add the KMS key as a resource that the QuickSight service role can access.


Expert Solution
Questions # 69:

A company receives test results from testing facilities that are located around the world. The company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process the files, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is increasing. The data engineer must reduce the data processing time.

Which solution will MOST reduce the data processing time?

Options:

A.

Use AWS Lambda to group the raw input files into larger files. Write the larger files back to Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.


B.

Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.


C.

Use the Amazon Redshift COPY command to move the raw input files from Amazon S3 directly into the Amazon Redshift tables. Process the files in Amazon Redshift.


D.

Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in Amazon EMR. Load the files into the Amazon Redshift tables.


Expert Solution
Questions # 70:

A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company ' s operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.

The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

AWS Glue workflows


B.

AWS Step Functions tasks


C.

AWS Lambda functions


D.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows


Expert Solution
Viewing page 7 out of 9 pages
Viewing questions 61-70 out of questions