Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions
Questions # 11:

A company stores employee data in Amazon Redshift A table named Employee uses columns named Region ID, Department ID, and Role ID as a compound sort key. Which queries will MOST increase the speed of a query by using a compound sort key of the table? (Select TWO.)

Options:

A.

Select * from Employee where Region ID='North America';


B.

Select * from Employee where Region ID='North America' and Department ID=20;


C.

Select * from Employee where Department ID=20 and Region ID='North America';


D.

Select " from Employee where Role ID=50;


E.

Select * from Employee where Region ID='North America' and Role ID=50;


Expert Solution
Questions # 12:

A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.

A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.

Which solution will capture the changed data MOST cost-effectively?

Options:

A.

Create an AWS Lambda function to identify the changes between the previous data and the current data. Configure the Lambda function to ingest the changes into the data lake.


B.

Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.


C.

Use an open source data lake format to merge the data source with the S3 data lake to insert the new data and update the existing data.


D.

Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.


Expert Solution
Questions # 13:

A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.

The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs.

Which solutions will meet these requirements? (Select TWO.)

Options:

A.

Use Amazon Redshift ML to generate inventory recommendations.


B.

Use SQL to invoke a remote SageMaker endpoint for prediction.


C.

Use Amazon Redshift ML to schedule regular data exports for offline model training.


D.

Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.


E.

Use Amazon Redshift as a file storage system to archive old inventory management reports.


Expert Solution
Questions # 14:

A company uses a variety of AWS and third-party data stores. The company wants to consolidate all the data into a central data warehouse to perform analytics. Users need fast response times for analytics queries.

The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run queries during a few hours each day with unpredictable spikes.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use Amazon Redshift Serverless to load all the data into Amazon Redshift managed storage (RMS).


B.

Use Amazon Athena to load all the data into Amazon S3 in Apache Parquet format.


C.

Use Amazon Redshift provisioned clusters to load all the data into Amazon Redshift managed storage (RMS).


D.

Use Amazon Aurora PostgreSQL to load all the data into Aurora.


Expert Solution
Questions # 15:

A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs. The APIs perform the functionality of the website. A data engineer needs to write a Python script that can be occasionally invoked through API Gateway. The code must return results to API Gateway.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Deploy a custom Python script on an Amazon Elastic Container Service (Amazon ECS) cluster.


B.

Create an AWS Lambda Python function with provisioned concurrency.


C.

Deploy a custom Python script that can integrate with API Gateway on Amazon Elastic Kubernetes Service (Amazon EKS).


D.

Create an AWS Lambda function. Ensure that the function is warm by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock events.


Expert Solution
Questions # 16:

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file.

The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table. The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process.

Which solution will meet these requirements?

Options:

A.

Use a provisioned Amazon EMR cluster to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.


B.

Load all the data files in parallel into Amazon Aurora. Run an AWS Glue job to load the data into Amazon Redshift.


C.

Use an AWS Glue job to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.


D.

Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.


Expert Solution
Questions # 17:

A security company stores IoT data that is in JSON format in an Amazon S3 bucket. The data structure can change when the company upgrades the IoT devices. The company wants to create a data catalog that includes the IoT data. The company's analytics department will use the data catalog to index the data.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create a new AWS Glue workload to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.


B.

Create an Amazon Redshift provisioned cluster. Create an Amazon Redshift Spectrum database for the analytics department to explore the data that is in Amazon S3. Create Redshift stored procedures to load the data into Amazon Redshift.


C.

Create an Amazon Athena workgroup. Explore the data that is in Amazon S3 by using Apache Spark through Athena. Provide the Athena workgroup schema and tables to the analytics department.


D.

Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create AWS Lambda user defined functions (UDFs) by using the Amazon Redshift Data API. Create an AWS Step Functions job to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.


Expert Solution
Questions # 18:

An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.

The company wants to optimize costs for data storage and retrieval.

Which solution will meet this requirement?

Options:

A.

Transition the data to Amazon S3 Glacier Flexible Retrieval.


B.

Transition the data from Amazon S3 to an Amazon Aurora cluster.


C.

Configure AWS Glue ETL to transform the incoming data to Apache Parquet format.


D.

Configure AWS Glue ETL to use Amazon EMR to process incoming data in parallel.


Expert Solution
Questions # 19:

A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance.

The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet.

Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)

Options:

A.

Turn on the public access setting for the DB instance.


B.

Update the security group of the DB instance to allow only Lambda function invocations on the database port.


C.

Configure the Lambda function to run in the same subnet that the DB instance uses.


D.

Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.


E.

Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.


Expert Solution
Questions # 20:

A data engineer needs to maintain a central metadata repository that users access through Amazon EMR and Amazon Athena queries. The repository needs to provide the schema and properties of many tables. Some of the metadata is stored in Apache Hive. The data engineer needs to import the metadata from Hive into the central metadata repository.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Use Amazon EMR and Apache Ranger.


B.

Use a Hive metastore on an EMR cluster.


C.

Use the AWS Glue Data Catalog.


D.

Use a metastore on an Amazon RDS for MySQL DB instance.


Expert Solution
Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions