Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 8 out of 9 pages
Viewing questions 71-80 out of questions
Questions # 71:

A data engineer is implementing model governance for machine learning (ML) workflows on AWS. The data engineer needs a solution that can track the complete lifecycle of the ML models, including data preparation, model training, and deployment stages. The solution must ensure reproducibility and audit compliance.

Options:

A.

Use Amazon SageMaker Debugger to capture metrics. Create associations between datasets and training jobs by monitoring training jobs.


B.

Use Amazon SageMaker ML Lineage Tracking to create associations between artifacts, training jobs, and datasets by recording metadata.


C.

Use Amazon SageMaker Model Monitor to create associations between artifacts and training jobs by tracking model performance.


D.

Use Amazon SageMaker Experiments to create associations between datasets and artifacts by tracking hyperparameters and metrics.


Expert Solution
Questions # 72:

A global ecommerce company processes customer transactions, inventory updates, and user activity logs across multiple AWS services. The company needs a scalable, fully managed, and event-driven orchestration solution to coordinate complex extract, transform, and load (ETL) workflows. The solution must use AWS Glue and Amazon EMR to process data. The data will be stored in Amazon Redshift and Amazon S3. The solution must support dependency management, automated retries, and data pipeline monitoring.

Which solution will meet these requirements?

Options:

A.

Use AWS Step Functions to define an express workflow that invokes the data transformation and loading tasks across Amazon EMR and AWS Glue.


B.

Create AWS Lambda functions for each step of the workflow. Configure Amazon EventBridge to invoke AWS Glue jobs. Configure the Lambda functions to process and move data through the pipeline.


C.

Use Apache Airflow on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to create Directed Acyclic Graphs (DAGs) to manage ETL workflows.


D.

Create an AWS Lambda function that runs each step of the workflow. Create an Amazon EventBridge scheduled rule to invoke the function every day.


Expert Solution
Questions # 73:

A company needs to implement a workflow to process transactions. Each transaction goes through multiple levels of validation. Each validation level depends on the preceding validation level.

The workflow must either process or reject each transaction within 24 hours. The workflow must run for less than 24 hours total.

Which solution will meet these requirements with the LEAST operational cost?

Options:

A.

Create a standard workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.


B.

Create an express workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.


C.

Use AWS Lambda functions to implement the workflow. Use Amazon EventBridge to invoke the validation steps.


D.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to implement the workflow.


Expert Solution
Questions # 74:

A company implements a data mesh that has a central governance account. The company needs to catalog all data in the governance account. The governance account uses AWS Lake Formation to centrally share data and grant access permissions.

The company has created a new data product that includes a group of Amazon Redshift Serverless tables. A data engineer needs to share the data product with a marketing team. The marketing team must have access to only a subset of columns. The data engineer needs to share the same data product with a compliance team. The compliance team must have access to a different subset of columns than the marketing team needs access to.

Which combination of steps should the data engineer take to meet these requirements? (Select TWO.)

Options:

A.

Create views of the tables that need to be shared. Include only the required columns.


B.

Create an Amazon Redshift data than that includes the tables that need to be shared.


C.

Create an Amazon Redshift managed VPC endpoint in the marketing team ' s account. Grant the marketing team access to the views.


D.

Share the Amazon Redshift data share to the Lake Formation catalog in the governance account.


E.

Share the Amazon Redshift data share to the Amazon Redshift Serverless workgroup in the marketing team ' s account.


Expert Solution
Questions # 75:

A data engineer is troubleshooting an AWS Glue workflow that occasionally fails. The engineer determines that the failures are a result of data quality issues. A business reporting team needs to receive an email notification any time the workflow fails in the future.

Which solution will meet this requirement?

Options:

A.

Create an Amazon Simple Notification Service (Amazon SNS) FIFO topic. Subscribe the team ' s email account to the SNS topic. Create an AWS Lambda function that initiates when the AWS Glue job state changes to FAILED. Set the SNS topic as the target.


B.

Create an Amazon Simple Notification Service (Amazon SNS) standard topic. Subscribe the team ' s email account to the SNS topic. Create an Amazon EventBridge rule that triggers when the AWS Glue Job state changes to FAILED. Set the SNS topic as the target.


C.

Create an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Subscribe the team ' s email account to the SQS queue. Create an AWS Config rule that triggers when the AWS Glue job state changes to FAILED. Set the SQS queue as the target.


D.

Create an Amazon Simple Queue Service (Amazon SQS) standard queue. Subscribe the team ' s email account to the SQS queue. Create an Amazon EventBridge rule that triggers when the AWS Glue job state changes to FAILED. Set the SQS queue as the target.


Expert Solution
Questions # 76:

A financial company wants to implement a data mesh. The data mesh must support centralized data governance, data analysis, and data access control. The company has decided to use AWS Glue for data catalogs and extract, transform, and load (ETL) operations.

Which combination of AWS services will implement a data mesh? (Choose two.)

Options:

A.

Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.


B.

Use Amazon S3 for data storage. Use Amazon Athena for data analysis.


C.

Use AWS Glue DataBrewfor centralized data governance and access control.


D.

Use Amazon RDS for data storage. Use Amazon EMR for data analysis.


E.

Use AWS Lake Formation for centralized data governance and access control.


Expert Solution
Questions # 77:

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.

The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.

Which AWS service should the company use to meet these requirements?

Options:

A.

AWS Lambda


B.

AWS Database Migration Service (AWS DMS)


C.

AWS Direct Connect


D.

AWS DataSync


Expert Solution
Questions # 78:

A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Use an AWS Glue PySpark job to ingest the source data into the data lake in .csv format.


B.

Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured data source. Configure the job to ingest the data into the data lake in JSON format.


C.

Use an AWS Glue PySpark job to ingest the source data into the data lake in Apache Avro format.


D.

Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured data source. Configure the job to write the data into the data lake in Apache Parquet format.


Expert Solution
Questions # 79:

A data engineer is using AWS Glue to build an extract, transform, and load (ETL) pipeline that processes streaming data from sensors. The pipeline sends the data to an Amazon S3 bucket in near real-time. The data engineer also needs to perform transformations and join the incoming data with metadata that is stored in an Amazon RDS for PostgreSQL database. The data engineer must write the results back to a second S3 bucket in Apache Parquet format.

Which solution will meet these requirements?

Options:

A.

Use an AWS Glue streaming job and AWS Glue Studio to perform the transformations and to write the data in Parquet format.


B.

Use AWS Glue jobs and AWS Glue Data Catalog to catalog the data from Amazon S3 and Amazon RDS. Configure the jobs to perform the transformations and joins and to write the output in Parquet format.


C.

Use an AWS Glue interactive session to process the streaming data and to join the data with the RDS database.


D.

Use an AWS Glue Python shell job to run a Python script that processes the data in batches. Keep track of processed files by using AWS Glue bookmarks.


Expert Solution
Questions # 80:

A company uploads .csv files to an Amazon S3 bucket. The company ' s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.

If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.

Which solution will meet these requirements?

Options:

A.

Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.


B.

Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.


C.

Use Apache Spark ' s DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.


D.

Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.


Expert Solution
Viewing page 8 out of 9 pages
Viewing questions 71-80 out of questions