Google Professional-Machine-Learning-Engineer Exam Questions Free Practice Test

Viewing page 6 out of 9 pages

Viewing questions 51-60 out of questions

Questions # 51:

You are developing a model to identify traffic signs in images extracted from videos taken from the dashboard of a vehicle. You have a dataset of 100 000 images that were cropped to show one out of ten different traffic signs. The images have been labeled accordingly for model training and are stored in a Cloud Storage bucket You need to be able to tune the model during each training run. How should you train the model?

Options:

Train a model for object detection by using Vertex Al AutoML.

Train a model for image classification by using Vertex Al AutoML.

Develop the model training code for object detection and tram a model by using Vertex Al custom training.

Develop the model training code for image classification and train a model by using Vertex Al custom training.

Expert Solution

Answer

Explanation

Image classification is a task where the model assigns a label to an image based on its content, such as “stop sign” or " speed limit " 1 . Object detection is a task where the model locates and identifies multiple objects in an image, and draws bounding boxes around them 2 . Since your dataset consists of images that were cropped to show one out of ten different traffic signs, you are dealing with an image classification problem, not an object detection problem. Therefore, you need to train a model for image classification, not object detection.

Vertex AI AutoML is a service that allows you to train and deploy high-qual ity ML models with minimal effort and machine learning expertise 3 . You can use Vertex AI AutoML to train a model for image classification by uploading your images and labels to a Vertex AI dataset, and then launching an Au toML training job 4 . However, Vertex AI AutoML does not allow you to tune the model during each training run, as it automatically selects the best m odel architecture and hyperparameters for your data 4 .

Vertex AI custom tr aining is a service that allows you to train and deploy your own custom ML models using your own code and frameworks 5 . You can use Vertex AI custom training to train a model for image classification by writing your own model training code, such as using TensorFlow or PyTorch, and then creating and running a custom training job. Vertex AI custom training allows you to tune the model during each training run, as you can specify the model architecture and hyperparameters in your code, and use Vertex AI Hyperparameter Tuning to optimize them .

Therefore, the best option for your scenario is to develop the model training code for image classification and train a model by using Vertex AI custom training.

Questions # 52:

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

Options:

Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job

Use the gcloud command-line tool to submit training jobs on Al Platform when you update your code

Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository

Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Expert Solution

Answer

Explanation

Developing ML models with AI Platform for image segmentation on CT scans requires a lot of computation and experimentation, as image segmentation is a complex and challenging task that involves assigning a label to each pixel in an image. Image segmentation can be used for various medical applications, such as tumor detection, organ segmentation, or lesion localization 1

To minimize the computation costs and manual intervention while having version control for the code, one should use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository. Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build can import source code from Cloud Source Repositories, Cloud Storage, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives 2

Cloud Build allows you to set up automated triggers that start a build when changes are pushed to a source code repository. You can configure triggers to filter the changes based on the branch, tag, or file path 3

Cloud Source Repositories is a service that provides fully managed private Git repositories on Google Cloud Platform. Cloud Source Repositories allows you to store, manage, and track your code using the Git version control system. You can also use Cloud Source Repositories to connect to other Google Cloud services, such as Cloud Build, Cloud Functions, or Cloud Run 4

To use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository, you need to do the following steps:

Create a Cloud Source Repository for your code, and push your code to the repository. You can use the Cloud SDK, Cloud Console, or Cloud Source Repositories API to create and manage your repository 5

Create a Cloud Build trigger for your repository, and specify the build configuration and the trigger settings. You can use the Cloud SDK, Cloud Console, or Cloud Build API to create and manage your trigger.

Specify the steps of the build in a YAML or JSON file, such as installing the dependencies, running the tests, building the container image, and submitting the training job to AI Platform. You can also use the Cloud Build predefined or custom build steps to simplify your build configuration.

Push your new code to the repository, and the trigger will start the build automatically. You can monitor the status and logs of the build using the Cloud SDK, Cloud Console, or Cloud Build API.

The other options are not as easy or feasible. Using Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job is not ideal, as Cloud Functions has limitations on the memory, CPU, and execution time, and does not provide a user interface for managing and tracking your builds. Using the gcloud command-line tool to submit training jobs on AI Platform when you update your code is not optimal, as it requires manual intervention and does not leverage the benefits of Cloud Build and its integration with Cloud Source Repositories. Creating an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor is not relevant, as Cloud Composer is mainly designed for orchestrating complex workflows across multiple systems, and does not provide a version control system for your code.

[References: 1: Image segmentation 2: Cloud Build overview 3: Creating and managing build triggers 4: Cloud Source Repositories overview 5: Quickstart: Create a repository : [Quickstart: Create a build trigger] : [Configuring builds] : [Viewing build results], , , ]

Questions # 53:

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, scikit-learn, and custom libraries. What should you do?

Options:

Use the Vertex AI Training to submit training jobs using any framework.

Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.

Create a library of VM images on Compute Engine, and publish these images on a centralized repository.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Expert Solution

Answer

Explanation

The best option for using a managed service to submit training jobs with different frameworks is to use Vertex AI Training. Vertex AI Training is a fully managed service that allows you to train custom models on Google Cloud using any framework, such as TensorFlow, PyTorch, scikit-learn, XGBoost, etc. You can also use custom containers to run your own libraries and dependencies. Vertex AI Training handles the infrastructure provisioning, scaling, and monitoring for you, so you can focus on your model development and optimization. Vertex AI Training also integrates with other Vertex AI services, such as Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Prediction. The other options are not as suitable for using a managed service to submit training jobs with different frameworks, because:

Configuring Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob would require more infrastructure maintenance, as Kubeflow is not a fully managed service, and you would have to provision and manage your own Kubernetes cluster. This would also incur more costs, as you would have to pay for the cluster resources, regardless of the training job usage. TFJob is also mainly designed for TensorFlow models, and might not support other frameworks as well as Vertex AI Training.

Creating a library of VM images on Compute Engine, and publishing these images on a centralized repository would require more development time and effort, as you would have to create and maintain different VM images for different frameworks and libraries. You would also have to manually configure and launch the VMs for each training job, and handle the scaling and monitoring yourself. This would not leverage the benefits of a managed service, such as Vertex AI Training.

Setting up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure would require more configuration and administration, as Slurm is not a native Google Cloud service, and you would have to install and manage it on your own VMs or clusters. Slurm is also a general-purpose workload manager, and might not have the same level of integration and optimization for ML frameworks and libraries as Vertex AI Training. References :

Vertex AI Training | Google Cloud

Kubeflow on Google Cloud | Google Cloud

TFJob for training TensorFlow models with Kubernetes | Kubeflow

Compute Engine | Google Cloud

Slurm Workload Manager

Questions # 54:

You recently deployed a model lo a Vertex Al endpoint and set up online serving in Vertex Al Feature Store. You have configured a daily batch ingestion job to update your featurestore During the batch ingestion jobs you discover that CPU utilization is high in your featurestores online serving nodes and that feature retrieval latency is high. You need to improve online serving performance during the daily batch ingestion. What should you do?

Options:

Schedule an increase in the number of online serving nodes in your featurestore prior to the batch ingestion jobs.

Enable autoscaling of the online serving nodes in your featurestore

Enable autoscaling for the prediction nodes of your DeployedModel in the Vertex Al endpoint.

Increase the worker counts in the importFeaturevalues request of your batch ingestion job.

Expert Solution

Questions # 55:

You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?

Options:

Standardize the data by transforming it with a logarithmic function.

Apply a principal component analysis (PCA) to minimize the effect of any particular feature.

Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.

Normalize the data by scaling it to have values between 0 and 1.

Expert Solution

Answer

Explanation

The best option to ensure that the features with the largest magnitude don’t overfit the model is to normalize the data by scaling it to have values between 0 and 1. This is also known as min-max scaling or feature scaling, and it can reduce the variance and skewness of the data, as well as improve the numerical stability and convergence of the model. Normalizing the data can also make the model less sensitive to the scale of the features, and more focused on the relative importance of each feature. Normalizing the data can be done using various methods, such as dividing each value by the maximum value, subtracting the minimum value and dividing by the range, or using the sklearn.preprocessing.MinMaxScaler function in Python.

The other options are not optimal for the following reasons:

A. Standardizing the data by transforming it with a logarithmic function is not a good option, as it can distort the distribution and relationship of the data, and introduce bias and errors. Moreover, the logarithmic function is not defined for negative or zero values, which can limit its applicability and cause problems for the model.

B. Applying a principal component analysis (PCA) to minimize the effect of any particular feature is not a good option, as it can reduce the interpretability and explainability of the data and the model. PCA is a dimensionality reduction technique that transforms the data into a new set of orthogonal features that capture the most variance in the data. However, these new features are not directly related to the original features, and can lose some information and meaning in the process. Moreover, PCA can be computationally expensive and complex, and may not be necessary for the problem at hand.

C. Using a binning strategy to replace the magnitude of each feature with the appropriate bin number is not a good option, as it can lose the granularity and precision of the data, and introduce noise and outliers. Binning is a discretization technique that groups the continuous values of a feature into a finite number of bins or categories. However, this can reduce the variability and diversity of the data, and create artificial boundaries and gaps that may not reflect the true nature of the data. Moreover, binning can be arbitrary and subjective, and depend on the choice of the bin size and number.

[:, Professional ML Engineer Exam Guide, Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate, Google Cloud launches machine learning engineer certification, Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization, sklearn.preprocessing.MinMaxScaler documentation, Principal Component Analysis Explained Visually, Binning Data in Python, ]

Questions # 56:

One of your models is trained using data provided by a third-party data broker. The data broker does not reliably notify you of formatting changes in the data. You want to make your model training pipeline more robust to issues like this. What should you do?

Options:

Use TensorFlow Data Validation to detect and flag schema anomalies.

Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that don’t match the schema with 0.

Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies.

Use custom TensorFlow functions at the start of your model training to detect and flag known formatting errors.

Expert Solution

Questions # 57:

You are an ML engineer on an agricultural research team working on a crop disease detection tool to detect leaf rust spots in images of crops to determine the presence of a disease. These spots, which can vary in shape and size, are correlated to the severity of the disease. You want to develop a solution that predicts the presence and severity of the disease with high accuracy. What should you do?

Options:

Create an object detection model that can localize the rust spots.

Develop an image segmentation ML model to locate the boundaries of the rust spots.

Develop a template matching algorithm using traditional computer vision libraries.

Develop an image classification ML model to predict the presence of the disease.

Expert Solution

Answer

Explanation

The best option for developing a solution that predicts the presence and severity of the disease with high accuracy is to develop an image segmentation ML model to locate the boundaries of the rust spots. Image segmentation is a technique that partitions an image into multiple regions, each corresponding to a different object or semantic category. Image segmentation can be used to detect and localize the rust spots in the images of crops, and measure their shape and size. This information can then be used to determine the presence and severity of the disease, as the rust spots are correlated to the disease symptoms. Image segmentation can also handle the variability of the rust spots, as it does not rely on predefined templates or thresholds. Image segmentation can be implemented using deep learning models, such as U-Net, Mask R-CNN, or DeepLab, which can learn from large-scale datasets and achieve high accuracy and robustness. The other options are not as suitable for developing a solution that predicts the presence and severity of the disease with high accuracy, because:

Creating an object detection model that can localize the rust spots would only provide the bounding boxes of the rust spots, not their exact boundaries. This would result in less precise measurements of the shape and size of the rust spots, and might affect the accuracy of the disease prediction. Object detection models are also more complex and computationally expensive than image segmentation models, as they have to perform both classification and localization tasks.

Developing a template matching algorithm using traditional computer vision libraries would require manually designing and selecting the templates for the rust spots, which might not capture the diversity and variability of the rust spots. Template matching algorithms are also sensitive to noise, occlusion, rotation, and scale changes, and might fail to detect the rust spots in different scenarios. Template matching algorithms are also less accurate and robust than deep learning models, as they do not learn from data.

Developing an image classification ML model to predict the presence of the disease would only provide a binary or categorical output, not the location or severity of the disease. Image classification models are also less informative and interpretable than image segmentation models, as they do not provide any spatial information or visual explanation for the prediction. Image classification models might also suffer from class imbalance or mislabeling issues, as the presence of the disease might not be consistent or clear across the images. References :

Image Segmentation | Computer Vision | Google Developers

Crop diseases and pests detection based on deep learning: a review | Plant Methods | Full Text

Using Deep Learning for Image-Based Plant Disease Detection

Computer Vision, IoT and Data Fusion for Crop Disease Detection Using …

On Using Artificial Intelligence and the Internet of Things for Crop …

Crop Di sease Detection Using Machine Learning and Computer Vision

Questions # 58:

Your company manages an ecommerce platform and has a large dataset of customer reviews. Each review has a positive, negative, or neutral label. You need to quickly prototype a sentiment analysis model that accurately predicts the sentiment labels of new customer reviews while minimizing time and cost. What should you do?

Options:

Train a sentiment analysis model by using a BERT-based model, and fine-tune the model by using domain-specific customer reviews.

Use the Natural Language API for real-time sentiment analysis.

Use AutoML to train a multi-class classification model that predicts sentiment labels based on the training data.

Use the Vertex AI Text embeddings API to vectorize the text, and train a regression model by using AutoML to predict sentiment scores.

Expert Solution

Questions # 59:

You work for a large retailer and you need to build a model to predict customer churn. The company has a dataset of historical customer data, including customer demographics, purchase history, and website activity. You need to create the model in BigQuery ML and thoroughly evaluate its performance. What should you do?

Options:

Create a linear regression model in BigQuery ML and register the model in Vertex Al Model Registry Evaluate the model performance in Vertex Al.

Create a logistic regression model in BigQuery ML and register the model in Vertex Al Model Registry. Evaluate the model performance in Vertex Al.

Create a linear regression model in BigQuery ML Use the ml. evaluate function to evaluate the model performance.

Create a logistic regression model in BigQuery ML Use the ml.confusion_matrix function to evaluate the model performance.

Expert Solution

Answer

Explanation

Customer churn is a binary classification problem, where the target variable is whether a customer has churned or not. Therefore, a logistic regression model is more suitable than a linear regression model, which is used for regression problems. A logistic regression model can output the probability of a customer churning, which can be used to rank the customers by their churn risk and take appropriate actions 1 .

BigQuery ML is a service that allows you to create and execute machine learning models in BigQuery using standard SQL queries 2 . You can use BigQuery ML to create a logistic regression model for customer churn prediction by using the CREATE MODEL statement and specifying the LOGISTIC_REG model type 3 . You can use the historical customer data as the input table for the mo del, and specify the features and the label columns 3 .

Vertex AI Model Registry is a central repository where you can manage the lifecycle of your ML models 4 . You can import models from various sources, such as BigQuery ML, AutoML, or custom models, and assign them to different versions and aliases 4 . You can also deploy models to endpoints, which are resources that provide a service URL for online prediction.

By registering the BigQuery ML model in Vertex AI Model Registry, you can leverage the Vertex AI features to evaluate and monitor the model performance 4 . You can use Vertex AI Experiments to track and compare the metrics of different model versions, such as accuracy, precision, recall, and AUC. You can also use Vertex AI Explainable AI to generate feature attributions that show how much each input feature contributed to the model’s prediction.

The other options are not suitable for your scenario, because they either use the wrong model type, such as linear regression, or they do not use Vertex AI to evaluate the model performance, which would limit the insights and actions you can take based on the model results.

Questions # 60:

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model ' s accuracy dropped to 66%. How can you make your production model more accurate?

Options:

Normalize the data for the training, and test datasets as two separate steps.

Split the training and test data based on time rather than a random split to avoid leakage

Add more data to your test set to ensure that you have a fair distribution and sample for testing

Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.

Expert Solution

Viewing page 6 out of 9 pages

Viewing questions 51-60 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Google Machine Learning Engineer Professional-Machine-Learning-Engineer Questions and answers with CertsForce