Pass the Amazon Web Services AWS Certified Specialty MLS-C01 Questions and answers with CertsForce

Viewing page 3 out of 10 pages
Viewing questions 21-30 out of questions
Questions # 21:

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.

Which prior probability distribution should the ML Specialist use for this variable?

Options:

A.

Poisson distribution ,


B.

Uniform distribution


C.

Normal distribution


D.

Binomial distribution


Expert Solution
Questions # 22:

A data scientist has developed a machine learning translation model for English to Japanese by using Amazon SageMaker's built-in seq2seq algorithm with 500,000 aligned sentence pairs. While testing with sample sentences, the data scientist finds that the translation quality is reasonable for an example as short as five words. However, the quality becomes unacceptable if the sentence is 100 words long.

Which action will resolve the problem?

Options:

A.

Change preprocessing to use n-grams.


B.

Add more nodes to the recurrent neural network (RNN) than the largest sentence's word count.


C.

Adjust hyperparameters related to the attention mechanism.


D.

Choose a different weight initialization type.


Expert Solution
Questions # 23:

A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant

will default on a credit card payment. The company has collected data from a large number of sources with

thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are

highly correlated, the large number of features slows down the training speed significantly, and that there are

some overfitting issues.

The Data Scientist on this project would like to speed up the model training time without losing a lot of

information from the original dataset.

Which feature engineering technique should the Data Scientist use to meet the objectives?

Options:

A.

Run self-correlation on all features and remove highly correlated features


B.

Normalize all numerical values to be between 0 and 1


C.

Use an autoencoder or principal component analysis (PCA) to replace original features with new features


D.

Cluster raw data using k-means and use sample data from each cluster to build a new dataset


Expert Solution
Questions # 24:

A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.

Which solution will meet these requirements?

Options:

A.

Define security groups to allow all HTTP inbound and outbound traffic. Assign the security groups to the SageMaker notebook instance.


B.

Configure the SageMaker notebook instance to have access to the VPC. Grant permission in the AWS Key Management Service (AWS KMS) key policy to the notebook's VPC.


C.

Assign an IAM role that provides S3 read access for the dataset to the SageMaker notebook. Grant permission in the KMS key policy to the 1AM role.


D.

Assign the same KMS key that encrypts the data in Amazon S3 to the SageMaker notebook instance.


Expert Solution
Questions # 25:

A company is building a predictive maintenance model based on machine learning (ML). The data is stored in a fully private Amazon S3 bucket that is encrypted at rest with AWS Key Management Service (AWS KMS) CMKs. An ML specialist must run data preprocessing by using an Amazon SageMaker Processing job that is triggered from code in an Amazon SageMaker notebook. The job should read data from Amazon S3, process it, and upload it back to the same S3 bucket. The preprocessing code is stored in a container image in Amazon Elastic Container Registry (Amazon ECR). The ML specialist needs to grant permissions to ensure a smooth data preprocessing workflow.

Which set of actions should the ML specialist take to meet these requirements?

Options:

A.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs, S3 read and write access to the relevant S3 bucket, and appropriate KMS and ECR permissions. Attach the role to the SageMaker notebook instance. Create an Amazon SageMaker Processing job from the notebook.


B.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs. Attach the role to the SageMaker notebook instance. Create an Amazon SageMaker Processing job with an IAM role that has read and write permissions to the relevant S3 bucket, and appropriate KMS and ECR permissions.


C.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs and to access Amazon ECR. Attach the role to the SageMaker notebook instance. Set up both an S3 endpoint and a KMS endpoint in the default VPC. Create Amazon SageMaker Processing jobs from the notebook.


D.

Create an IAM role that has permissions to create Amazon SageMaker Processing jobs. Attach the role to the SageMaker notebook instance. Set up an S3 endpoint in the default VPC. Create Amazon SageMaker Processing jobs with the access key and secret key of the IAM user with appropriate KMS and ECR permissions.


Expert Solution
Questions # 26:

A company needs to deploy a chatbot to answer common questions from customers. The chatbot must base its answers on company documentation.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Index company documents by using Amazon Kendra. Integrate the chatbot with Amazon Kendra by using the Amazon Kendra Query API operation to answer customer questions.


B.

Train a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents. Deploy the model as a real-time Amazon SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.


C.

Train an Amazon SageMaker BlazingText model based on past customer questions and company documents. Deploy the model as a real-time SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.


D.

Index company documents by using Amazon OpenSearch Service. Integrate the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation to answer customer questions.


Expert Solution
Questions # 27:

A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.

The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el.

What can the ML specialist meet these requirements with the LEAST operational overhead?

Options:

A.

Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles.


B.

Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report.


C.

Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.


D.

Use Amazon Lookout for Equipment to find and remove outliers from the dataset.


Expert Solution
Questions # 28:

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.

Which model describes the underlying data in this situation?

Options:

A.

A naive Bayesian model, since the features are all conditionally independent.


B.

A full Bayesian network, since the features are all conditionally independent.


C.

A naive Bayesian model, since some of the features are statistically dependent.


D.

A full Bayesian network, since some of the features are statistically dependent.


Expert Solution
Questions # 29:

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users.

What should the Specialist do to meet this objective?

Options:

A.

Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR.


B.

Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.


C.

Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR.


D.

Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR.


Expert Solution
Questions # 30:

A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.

What type of machine learning model should be used?

Options:

A.

Classification month-to-month using supervised learning of the 200 categories based on claim contents.


B.

Reinforcement learning using claim IDs and timestamps where the agent will identify how many claims in each category to expect from month to month.


C.

Forecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month.


D.

Classification with supervised learning of the categories for which partial information on claim contents is provided, and forecasting using claim IDs and timestamps for all other categories.


Expert Solution
Viewing page 3 out of 10 pages
Viewing questions 21-30 out of questions