Pass the Amazon Web Services AWS Certified Specialty MLS-C01 Questions and answers with CertsForce

Viewing page 9 out of 10 pages
Viewing questions 81-90 out of questions
Questions # 81:

A gaming company has launched an online game where people can start playing for free but they need to pay if they choose to use certain features The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year The company has gathered a labeled dataset from 1 million users

The training dataset consists of 1.000 positive samples (from users who ended up paying within 1 year) and 999.000 negative samples (from users who did not use any paid features) Each data sample consists of 200 features including user age, device, location, and play patterns

Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set However, the prediction results on a test dataset were not satisfactory.

Which of the following approaches should the Data Science team take to mitigate this issue? (Select TWO.)

Options:

A.

Add more deep trees to the random forest to enable the model to learn more features.


B.

indicate a copy of the samples in the test database in the training dataset


C.

Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.


D.

Change the cost function so that false negatives have a higher impact on the cost value than false positives


E.

Change the cost function so that false positives have a higher impact on the cost value than false negatives


Expert Solution
Questions # 82:

A company plans to build a custom natural language processing (NLP) model to classify and prioritize user feedback. The company hosts the data and all machine learning (ML) infrastructure in the AWS Cloud. The ML team works from the company's office, which has an IPsec VPN connection to one VPC in the AWS Cloud.

The company has set both the enableDnsHostnames attribute and the enableDnsSupport attribute of the VPC to true. The company's DNS resolvers point to the VPC DNS. The company does not allow the ML team to access Amazon SageMaker notebooks through connections that use the public internet. The connection must stay within a private network and within the AWS internal network.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Create a VPC interface endpoint for the SageMaker notebook in the VPC. Access the notebook through a VPN connection and the VPC endpoint.


B.

Create a bastion host by using Amazon EC2 in a public subnet within the VPC. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.


C.

Create a bastion host by using Amazon EC2 in a private subnet within the VPC with a NAT gateway. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.


D.

Create a NAT gateway in the VPC. Access the SageMaker notebook HTTPS endpoint through a VPN connection and the NAT gateway.


Expert Solution
Questions # 83:

Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming event. Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Options:

A.

Pre-split the data before uploading to Amazon S3


B.

Have Amazon ML split the data randomly.


C.

Have Amazon ML split the data sequentially.


D.

Perform custom cross-validation on the data


Expert Solution
Questions # 84:

A data scientist receives a new dataset in .csv format and stores the dataset in Amazon S3. The data scientist will use this dataset to train a machine learning (ML) model.

The data scientist first needs to identify any potential data quality issues in the dataset. The data scientist must identify values that are missing or values that are not valid. The data scientist must also identify the number of outliers in the dataset.

Which solution will meet these requirements with the LEAST operational effort?)

Options:

A.

Create an AWS Glue job to transform the data from .csv format to Apache Parquet format. Use an AWS Glue crawler and Amazon Athena with appropriate SQL queries to retrieve the required information.


B.

Leave the dataset in .csv format. Use an AWS Glue crawler and Amazon Athena with appropriate SQL queries to retrieve the required information.


C.

Create an AWS Glue job to transform the data from .csv format to Apache Parquet format. Import the data into Amazon SageMaker Data Wrangler. Use the Data Quality and Insights Report to retrieve the required information.


D.

Leave the dataset in .csv format. Import the data into Amazon SageMaker Data Wrangler. Use the Data Quality and Insights Report to retrieve the required information.


Expert Solution
Questions # 85:

A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.

Which implementation will meet these requirements?

Options:

A.

Use encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.


B.

Use SageMaker built-in transient keys to encrypt the ML data volumes. Enable default encryption for new Amazon Elastic Block Store (Amazon EBS) volumes.


C.

Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.


D.

Use AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the ML storage volumes, and to encrypt the model artifacts and data in Amazon S3.


Expert Solution
Questions # 86:

A Machine Learning Specialist has built a model using Amazon SageMaker built-in algorithms and is not getting expected accurate results The Specialist wants to use hyperparameter optimization to increase the model's accuracy

Which method is the MOST repeatable and requires the LEAST amount of effort to achieve this?

Options:

A.

Launch multiple training jobs in parallel with different hyperparameters


B.

Create an AWS Step Functions workflow that monitors the accuracy in Amazon CloudWatch Logs and relaunches the training job with a defined list of hyperparameters


C.

Create a hyperparameter tuning job and set the accuracy as an objective metric.


D.

Create a random walk in the parameter space to iterate through a range of values that should be used for each individual hyperparameter


Expert Solution
Questions # 87:

A company is setting up a mechanism for data scientists and engineers from different departments to access an Amazon SageMaker Studio domain. Each department has a unique SageMaker Studio domain.

The company wants to build a central proxy application that data scientists and engineers can log in to by using their corporate credentials. The proxy application will authenticate users by using the company's existing Identity provider (IdP). The application will then route users to the appropriate SageMaker Studio domain.

The company plans to maintain a table in Amazon DynamoDB that contains SageMaker domains for each department.

How should the company meet these requirements?

Options:

A.

Use the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table. Pass the presigned URL to the proxy application.


B.

Use the SageMaker CreateHuman TaskUi API to generate a UI URL. Pass the URL to the proxy application.


C.

Use the Amazon SageMaker ListHumanTaskUis API to list all UI URLs. Pass the appropriate URL to the DynamoDB table so that the proxy application can use the URL.


D.

Use the SageMaker CreatePresignedNotebookInstanceUrl API to generate a presigned URL. Pass the presigned URL to the proxy application.


Expert Solution
Questions # 88:

A machine learning specialist is preparing data for training on Amazon SageMaker. The specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.

What should the specialist do to optimize the data for training on SageMaker?

Options:

A.

Use the SageMaker batch transform feature to transform the training data into a DataFrame.


B.

Use AWS Glue to compress the data into the Apache Parquet format.


C.

Transform the dataset into the RecordIO protobuf format.


D.

Use the SageMaker hyperparameter optimization feature to automatically optimize the data.


Expert Solution
Questions # 89:

While reviewing the histogram for residuals on regression evaluation data a Machine Learning Specialist notices that the residuals do not form a zero-centered bell shape as shown What does this mean?

Question # 89

Options:

A.

The model might have prediction errors over a range of target values.


B.

The dataset cannot be accurately represented using the regression model


C.

There are too many variables in the model


D.

The model is predicting its target values perfectly.


Expert Solution
Questions # 90:

A retail company uses a machine learning (ML) model for daily sales forecasting. The company’s brand manager reports that the model has provided inaccurate results for the past 3 weeks.

At the end of each day, an AWS Glue job consolidates the input data that is used for the forecasting with the actual daily sales data and the predictions of the model. The AWS Glue job stores the data in Amazon S3. The company’s ML team is using an Amazon SageMaker Studio notebook to gain an understanding about the source of the model's inaccuracies.

What should the ML team do on the SageMaker Studio notebook to visualize the model's degradation MOST accurately?

Options:

A.

Create a histogram of the daily sales over the last 3 weeks. In addition, create a histogram of the daily sales from before that period.


B.

Create a histogram of the model errors over the last 3 weeks. In addition, create a histogram of the model errors from before that period.


C.

Create a line chart with the weekly mean absolute error (MAE) of the model.


D.

Create a scatter plot of daily sales versus model error for the last 3 weeks. In addition, create a scatter plot of daily sales versus model error from before that period.


Expert Solution
Viewing page 9 out of 10 pages
Viewing questions 81-90 out of questions