Pass the Amazon Web Services AWS Certified Specialty MLS-C01 Questions and answers with CertsForce

Viewing page 4 out of 10 pages
Viewing questions 31-40 out of questions
Questions # 31:

A company has raw user and transaction data stored in AmazonS3 a MySQL database, and Amazon RedShift A Data Scientist needs to perform an analysis by joining the three datasets from Amazon S3, MySQL, and Amazon RedShift, and then calculating the average-of a few selected columns from the joined data

Which AWS service should the Data Scientist use?

Options:

A.

Amazon Athena


B.

Amazon Redshift Spectrum


C.

AWS Glue


D.

Amazon QuickSight


Expert Solution
Questions # 32:

A trucking company is collecting live image data from its fleet of trucks across the globe. The data is growing rapidly and approximately 100 GB of new data is generated every day. The company wants to explore machine learning uses cases while ensuring the data is only accessible to specific IAM users.

Which storage option provides the most processing flexibility and will allow access control with IAM?

Options:

A.

Use a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users.


B.

Use an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies.


C.

Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies.


D.

Configure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users.


Expert Solution
Questions # 33:

A Machine Learning Specialist needs to create a data repository to hold a large amount of time-based training data for a new model. In the source system, new files are added every hour Throughout a single 24-hour period, the volume of hourly updates will change significantly. The Specialist always wants to train on the last 24 hours of the data

Which type of data repository is the MOST cost-effective solution?

Options:

A.

An Amazon EBS-backed Amazon EC2 instance with hourly directories


B.

An Amazon RDS database with hourly table partitions


C.

An Amazon S3 data lake with hourly object prefixes


D.

An Amazon EMR cluster with hourly hive partitions on Amazon EBS volumes


Expert Solution
Questions # 34:

A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON format. During model evaluation, the data scientist discovered that the model recommends certain stopwords such as "a," "an,” and "the" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries. After a few iterations of tag review with the content team, the data scientist notices that the rare words are unusual but feasible. The data scientist also must ensure that the tag recommendations of the generated model do not include the stopwords.

What should the data scientist do to meet these requirements?

Options:

A.

Use the Amazon Comprehend entity recognition API operations. Remove the detected words from the blog post data. Replace the blog post data source in the S3 bucket.


B.

Run the SageMaker built-in principal component analysis (PCA) algorithm with the blog post data from the S3 bucket as the data source. Replace the blog post data in the S3 bucket with the results of the training job.


C.

Use the SageMaker built-in Object Detection algorithm instead of the NTM algorithm for the training job to process the blog post data.


D.

Remove the stop words from the blog post data by using the Count Vectorizer function in the scikit-learn library. Replace the blog post data in the S3 bucket with the results of the vectorizer.


Expert Solution
Questions # 35:

During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates What is the MOST likely cause of this issue?

Options:

A.

The class distribution in the dataset is imbalanced


B.

Dataset shuffling is disabled


C.

The batch size is too big


D.

The learning rate is very high


Expert Solution
Questions # 36:

A company is running a machine learning prediction service that generates 100 TB of predictions every day A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team.

Which solution requires the LEAST coding effort?

Options:

A.

Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Give the Business team read-only access to S3


B.

Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team


C.

Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team


D.

Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team.


Expert Solution
Questions # 37:

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena The dataset contains more than 800.000 records stored as plaintext CSV files Each record contains 200 columns and is approximately 1 5 MB in size Most queries will span 5 to 10 columns only

How should the Machine Learning Specialist transform the dataset to minimize query runtime?

Options:

A.

Convert the records to Apache Parquet format


B.

Convert the records to JSON format


C.

Convert the records to GZIP CSV format


D.

Convert the records to XML format


Expert Solution
Questions # 38:

A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results.

Which modeling approach will deliver the MOST accurate prediction of product quality?

Options:

A.

Amazon SageMaker DeepAR forecasting algorithm


B.

Amazon SageMaker XGBoost algorithm


C.

Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm


D.

A convolutional neural network (CNN) and ResNet


Expert Solution
Questions # 39:

A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (Pll). The dataset:

* Must be accessible from a VPC only.

* Must not traverse the public internet.

How can these requirements be satisfied?

Options:

A.

Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.


B.

Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.


C.

Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.


D.

Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance.


Expert Solution
Questions # 40:

A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket protected with server-side

encryption using AWS KMS.

How should the ML Specialist define the Amazon SageMaker notebook instance so it can read the same

dataset from Amazon S3?

Options:

A.

Define security group(s) to allow all HTTP inbound/outbound traffic and assign those security group(s) tothe Amazon SageMaker notebook instance.


B.

Сonfigure the Amazon SageMaker notebook instance to have access to the VPC. Grant permission in theKMS key policy to the notebook’s KMS role.


C.

Assign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset. Grantpermission in the KMS key policy to that role.


D.

Assign the same KMS key used to encrypt data in Amazon S3 to the Amazon SageMaker notebookinstance.


Expert Solution
Viewing page 4 out of 10 pages
Viewing questions 31-40 out of questions