Amazon Web Services MLS-C01 Exam Questions Free Practice Test

Viewing page 3 out of 10 pages

Viewing questions 21-30 out of questions

Questions # 21:

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.

Which prior probability distribution should the ML Specialist use for this variable?

Options:

Poisson distribution ,

Uniform distribution

Normal distribution

Binomial distribution

Expert Solution

Questions # 22:

A data scientist has developed a machine learning translation model for English to Japanese by using Amazon SageMaker's built-in seq2seq algorithm with 500,000 aligned sentence pairs. While testing with sample sentences, the data scientist finds that the translation quality is reasonable for an example as short as five words. However, the quality becomes unacceptable if the sentence is 100 words long.

Which action will resolve the problem?

Options:

Change preprocessing to use n-grams.

Add more nodes to the recurrent neural network (RNN) than the largest sentence's word count.

Adjust hyperparameters related to the attention mechanism.

Choose a different weight initialization type.

Expert Solution

Questions # 23:

A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant

will default on a credit card payment. The company has collected data from a large number of sources with

thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are

highly correlated, the large number of features slows down the training speed significantly, and that there are

some overfitting issues.

The Data Scientist on this project would like to speed up the model training time without losing a lot of

information from the original dataset.

Which feature engineering technique should the Data Scientist use to meet the objectives?

Options:

Run self-correlation on all features and remove highly correlated features

Normalize all numerical values to be between 0 and 1

Use an autoencoder or principal component analysis (PCA) to replace original features with new features

Cluster raw data using k-means and use sample data from each cluster to build a new dataset

Expert Solution

Questions # 24:

A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.

Which solution will meet these requirements?

Options:

Define security groups to allow all HTTP inbound and outbound traffic. Assign the security groups to the SageMaker notebook instance.

Configure the SageMaker notebook instance to have access to the VPC. Grant permission in the AWS Key Management Service (AWS KMS) key policy to the notebook's VPC.

Assign an IAM role that provides S3 read access for the dataset to the SageMaker notebook. Grant permission in the KMS key policy to the 1AM role.

Assign the same KMS key that encrypts the data in Amazon S3 to the SageMaker notebook instance.

Expert Solution

Questions # 25:

A company is building a predictive maintenance model based on machine learning (ML). The data is stored in a fully private Amazon S3 bucket that is encrypted at rest with AWS Key Management Service (AWS KMS) CMKs. An ML specialist must run data preprocessing by using an Amazon SageMaker Processing job that is triggered from code in an Amazon SageMaker notebook. The job should read data from Amazon S3, process it, and upload it back to the same S3 bucket. The preprocessing code is stored in a container image in Amazon Elastic Container Registry (Amazon ECR). The ML specialist needs to grant permissions to ensure a smooth data preprocessing workflow.

Which set of actions should the ML specialist take to meet these requirements?

Options:

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs, S3 read and write access to the relevant S3 bucket, and appropriate KMS and ECR permissions. Attach the role to the SageMaker notebook instance. Create an Amazon SageMaker Processing job from the notebook.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs. Attach the role to the SageMaker notebook instance. Create an Amazon SageMaker Processing job with an IAM role that has read and write permissions to the relevant S3 bucket, and appropriate KMS and ECR permissions.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs and to access Amazon ECR. Attach the role to the SageMaker notebook instance. Set up both an S3 endpoint and a KMS endpoint in the default VPC. Create Amazon SageMaker Processing jobs from the notebook.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs. Attach the role to the SageMaker notebook instance. Set up an S3 endpoint in the default VPC. Create Amazon SageMaker Processing jobs with the access key and secret key of the IAM user with appropriate KMS and ECR permissions.

Expert Solution

Answer

Explanation

The correct solution for granting permissions for data preprocessing is to use the following steps:

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs. Attach the role to the SageMaker notebook instance. This role allows the ML specialist to run Processing jobs from the notebook code1

Create an Amazon SageMaker Processing job with an IAM role that has read and write permissions to the relevant S3 bucket, and appropriate KMS and ECR permissions. This role allows the Processing job to access the data in the encrypted S3 bucket, decrypt it with the KMS CMK, and pull the container image from ECR23

The other options are incorrect because they either miss some permissions or use unnecessary steps. For example:

Option A uses a single IAM role for both the notebook instance and the Processing job. This role may have more permissions than necessary for the notebook instance, which violates the principle of least privilege4

Option C sets up both an S3 endpoint and a KMS endpoint in the default VPC. These endpoints are not required for the Processing job to access the data in the encrypted S3 bucket. They are only needed if the Processing job runs in network isolation mode, which is not specified in the question.

Option D uses the access key and secret key of the IAM user with appropriate KMS and ECR permissions. This is not a secure way to pass credentials to the Processing job. It also requires the ML specialist to manage the IAM user and the keys.

1: Create an Amazon SageMaker Notebook Instance - Amazon SageMaker

2: Create a Processing Job - Amazon SageMaker

3: Use AWS KMS–Managed Encryption Keys - Amazon Simple Storage Service

4: IAM Best Practices - AWS Identity and Access Management

Network Isolation - Amazon SageMaker

Understanding and Getting Your Security Credentials - AWS General Reference

Questions # 26:

A company needs to deploy a chatbot to answer common questions from customers. The chatbot must base its answers on company documentation.

Which solution will meet these requirements with the LEAST development effort?

Options:

Index company documents by using Amazon Kendra. Integrate the chatbot with Amazon Kendra by using the Amazon Kendra Query API operation to answer customer questions.

Train a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents. Deploy the model as a real-time Amazon SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.

Train an Amazon SageMaker BlazingText model based on past customer questions and company documents. Deploy the model as a real-time SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.

Index company documents by using Amazon OpenSearch Service. Integrate the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation to answer customer questions.

Expert Solution

Answer

Explanation

The solution A will meet the requirements with the least development effort because it uses Amazon Kendra, which is a highly accurate and easy to use intelligent search service powered by machine learning. Amazon Kendra can index company documents from various sources and formats, such as PDF, HTML, Word, and more. Amazon Kendra can also integrate with chatbots by using the Amazon Kendra Query API operation, which can understand natural language questions and provide relevant answers from the indexed documents. Amazon Kendra can also provide additional information, such as document excerpts, links, and FAQs, to enhance the chatbot experience1.

The other options are not suitable because:

Option B: Training a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents, deploying the model as a real-time Amazon SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BiDAF network, which is a complex deep learning model for question answering. The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic2.

Option C: Training an Amazon SageMaker BlazingText model based on past customer questions and company documents, deploying the model as a real-time SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BlazingText model, which is a fast and scalable text classification and word embedding algorithm. The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic3.

Option D: Indexing company documents by using Amazon OpenSearch Service and integrating the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation will not meet the requirements effectively. Amazon OpenSearch Service is a fully managed service that provides fast and scalable search and analytics capabilities. However, it is not designed for natural language question answering, and it may not provide accurate or relevant answers for the chatbot. Moreover, the k-NN Query API operation is used to find the most similar documents or vectors based on a distance function, not to find the best answers based on a natural language query4.

1: Amazon Kendra

2: Bidirectional Attention Flow for Machine Comprehension

3: Amazon SageMaker BlazingText

4: Amazon OpenSearch Service

Questions # 27:

A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.

The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el.

What can the ML specialist meet these requirements with the LEAST operational overhead?

Options:

Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles.

Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report.

Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.

Use Amazon Lookout for Equipment to find and remove outliers from the dataset.

Expert Solution

Questions # 28:

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.

Which model describes the underlying data in this situation?

Options:

A naive Bayesian model, since the features are all conditionally independent.

A full Bayesian network, since the features are all conditionally independent.

A naive Bayesian model, since some of the features are statistically dependent.

A full Bayesian network, since some of the features are statistically dependent.

Expert Solution

Questions # 29:

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users.

What should the Specialist do to meet this objective?

Options:

Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR.

Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.

Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR.

Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR.

Expert Solution

Answer

Questions # 30:

A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.

What type of machine learning model should be used?

Options:

Classification month-to-month using supervised learning of the 200 categories based on claim contents.

Reinforcement learning using claim IDs and timestamps where the agent will identify how many claims in each category to expect from month to month.

Forecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month.

Classification with supervised learning of the categories for which partial information on claim contents is provided, and forecasting using claim IDs and timestamps for all other categories.

Expert Solution

Answer

Viewing page 3 out of 10 pages

Viewing questions 21-30 out of questions

Pass the Amazon Web Services AWS Certified Specialty MLS-C01 Questions and answers with CertsForce