Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 7 out of 8 pages
Viewing questions 61-70 out of questions
Questions # 61:

Files from multiple data sources arrive in an Amazon S3 bucket on a regular basis. A data engineer wants to ingest new files into Amazon Redshift in near real time when the new files arrive in the S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use the query editor v2 to schedule a COPY command to load new files into Amazon Redshift.


B.

Use the zero-ETL integration between Amazon Aurora and Amazon Redshift to load new files into Amazon Redshift.


C.

Use AWS Glue job bookmarks to extract, transform, and load (ETL) load new files into Amazon Redshift.


D.

Use S3 Event Notifications to invoke an AWS Lambda function that loads new files into Amazon Redshift.


Expert Solution
Questions # 62:

A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII.

To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset.

Which solution will meet the requirements with the LEAST operational overhead?

Options:

A.

Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.


B.

Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.


C.

Use AWS Glue to transform the data for each application. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.


D.

Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.


Expert Solution
Questions # 63:

A data engineer maintains a materialized view that is based on an Amazon Redshift database. The view has a column named load_date that stores the date when each row was loaded.

The data engineer needs to reclaim database storage space by deleting all the rows from the materialized view.

Which command will reclaim the MOST database storage space?

Question # 63

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


Expert Solution
Questions # 64:

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.

The data engineer's original query is as follows:

SELECT product_name, sum(sales_amount)

FROM sales_data

WHERE year = 2023

GROUP BY product_name

How should the data engineer modify the Athena query to meet these requirements?

Options:

A.

Replace sum(sales amount) with count(*J for the aggregation.


B.

Change WHERE year = 2023 to WHERE extractlyear FROM sales data) = 2023.


C.

Add HAVING sumfsales amount) > 0 after the GROUP BY clause.


D.

Remove the GROUP BY clause


Expert Solution
Questions # 65:

A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.

The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.

Which solution will meet these requirements?

Options:

A.

Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.


B.

Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.


C.

Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.


D.

Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.


Expert Solution
Questions # 66:

A company uses a variety of AWS and third-party data stores. The company wants to consolidate all the data into a central data warehouse to perform analytics. Users need fast response times for analytics queries.

The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run queries during a few hours each day with unpredictable spikes.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use Amazon Redshift Serverless to load all the data into Amazon Redshift managed storage (RMS).


B.

Use Amazon Athena to load all the data into Amazon S3 in Apache Parquet format.


C.

Use Amazon Redshift provisioned clusters to load all the data into Amazon Redshift managed storage (RMS).


D.

Use Amazon Aurora PostgreSQL to load all the data into Aurora.


Expert Solution
Questions # 67:

A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions.

The data engineer requires a less manual way to update the Lambda functions.

Which solution will meet this requirement?

Options:

A.

Store a pointer to the custom Python scripts in the execution context object in a shared Amazon S3 bucket.


B.

Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.


C.

Store a pointer to the custom Python scripts in environment variables in a shared Amazon S3 bucket.


D.

Assign the same alias to each Lambda function. Call reach Lambda function by specifying the function's alias.


Expert Solution
Questions # 68:

A company needs to implement a new inventory management system that provides near real-time updates and visibility across all AWS Regions. The new solution must provide centralized access control over data access and permissions. The company has a separate inventory management team assigned to each Region. Each inventory management team needs to update inventory levels.

A data engineer must implement Amazon Redshift data sharing with write capabilities. The solution must follow the principle of least privilege.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Configure a single Redshift datashare from the company's headquarters that provides read-only access for all Regions. Configure a separate AWS Glue ETL job to update data for each Region.


B.

Configure three Regional Redshift datashares that provide full write access. Allow full self-managed access controls.


C.

Configure a single Redshift datashare from the company's headquarters that has selective write permissions for inventory. Set up Regional namespace controls.


D.

Configure separate Redshift datashares for multiple table types that provide full write access. Distribute the datashares across all Regional clusters. Allow self-managed Regional schema permissions.


Expert Solution
Questions # 69:

A company uses AWS Glue ETL pipelines to process data. The company uses Amazon Athena to analyze data in an Amazon S3 bucket.

To better understand shipping timelines, the company decides to collect and store shipping dates and delivery dates in addition to order data. The company adds a data quality check to ensure that the shipping date is later than the order date and that the delivery date is later than the shipping date. Orders that fail the quality check must be stored in a second Amazon S3 bucket.

Which solution will meet these requirements in the MOST cost-effective way?

Options:

A.

Use AWS Glue DataBrew DATEDIFF functions to create two additional columns. Validate the new columns. Write failed records to a second S3 bucket.


B.

Use Amazon Athena to query the three date columns and compare the values. Export failed records to a second S3 bucket.


C.

Use AWS Glue Data Quality to create a custom rule that validates the three date columns. Route records that fail the rule to a second S3 bucket.


D.

Use an AWS Glue crawler to populate the AWS Glue Data Catalog. Use the three date columns to create a filter.


Expert Solution
Questions # 70:

A data engineer must implement Amazon Redshift Serverless as a data warehouse for a company. The data engineer needs to integrate multiple Amazon Aurora MySQL databases into Amazon Redshift. The solution must maintain near real-time latency and minimize infrastructure management as much as possible.

Which solution will meet these requirements?

Options:

A.

Use AWS Database Migration Service (AWS DMS) Serverless to ingest data into Amazon Redshift.


B.

Create a Python module for an AWS Glue job to standardize the data ingestion from Aurora MySQL into Amazon Redshift.


C.

Create an AWS Lambda function to ingest data into Amazon Redshift.


D.

Set up a zero-ETL integration between the Aurora MySQL databases and Amazon Redshift Serverless.


Expert Solution
Viewing page 7 out of 8 pages
Viewing questions 61-70 out of questions