Google Professional-Machine-Learning-Engineer Exam Questions Free Practice Test

Viewing page 9 out of 9 pages

Viewing questions 81-90 out of questions

Questions # 81:

You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks. You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?

Options:

Use BigQuery ML to run several regression models, and analyze their performance.

Read the data from BigQuery using Dataproc, and run several models using SparkML.

Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.

Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.

Expert Solution

Answer

Explanation

Option A is correct because using BigQuery ML to run several regression models, and analyze their performance is the most efficient and self-serviced way to complete the task. BigQuery ML is a service that allows you to create and use ML models within BigQuery using SQL queries 1 . You can use BigQuery ML to run different types of regression models, such as linear regression, logistic regression, or DNN regression 2 . You can also use BigQuery ML to analyze the performance of your models, such as the mean squared error, the accuracy, or the ROC curve 3 . BigQuery ML is fast, scalable, and easy to use, as it does not require any data movement, coding, or additional tools 4 .

Option B is incorrect because reading the data from BigQuery using Dataproc, and running several models using SparkML is not the most efficient and self-serviced way to complete the task. Dataproc is a service that allows you to create and manage clusters of virtual machines that run Apache Spark and other open-source tools 5 . SparkML is a library that provides ML algorithms and utilities for Spark. However, this option requires more effort and resources than option A, as it involves moving the data from BigQuery to Dataproc, creating and configuring the clusters, writing and running the SparkML code, and analyzing the results.

Option C is incorrect because using Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics is not the most efficient and self-serviced way to complete the task. Vertex AI Workbench is a service that allows you to create and use notebooks for ML development and experimentation. Scikit-learn is a library that provides ML algorithms and utilities for Python. However, this option also requires more effort and resources than option A, as it involves creating and managing the notebooks, writing and running the scikit-learn code, and analyzing the results.

Option D is incorrect because training a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms is not the most efficient and self-serviced way to complete the task. TensorFlow is a framework that allows you to create and train ML models using Python or other languages. Vertex AI is a service that allows you to train and deploy ML models using built-in algorithms or custom containers. However, this option also requires more effort and resources than option A, as it involves writing and running the TensorFlow code, creating and managing the training jobs, and analyzing the results.

[References:, BigQuery ML overview, Creating a model in BigQuery ML, Evaluating a model in BigQuery ML, BigQuery ML benefits, Dataproc overview, [SparkML overview], [Vertex AI Workbench overview], [Scikit-learn overview], [TensorFlow overview], [Vertex AI overview], ]

Questions # 82:

You work on a team that builds state-of-the-art deep learning models by using the TensorFlow framework. Your team runs multiple ML experiments each week which makes it difficult to track the experiment runs. You want a simple approach to effectively track, visualize and debug ML experiment runs on Google Cloud while minimizing any overhead code. How should you proceed?

Options:

Set up Vertex Al Experiments to track metrics and parameters Configure Vertex Al TensorBoard for visualization.

Set up a Cloud Function to write and save metrics files to a Cloud Storage Bucket Configure a Google Cloud VM to host TensorBoard locally for visualization.

Set up a Vertex Al Workbench notebook instance Use the instance to save metrics data in a Cloud Storage bucket and to host TensorBoard locally for visualization.

Set up a Cloud Function to write and save metrics files to a BigQuery table. Configure a Google Cloud VM to host TensorBoard locally for visualization.

Expert Solution

Questions # 83:

You work for a gaming company that develops massively multiplayer online (MMO) games. You built a TensorFlow model that predicts whether players will make in-app purchases of more than $10 in the next two weeks. The model’s predictions will be used to adapt each user’s game experience. User data is stored in BigQuery. How should you serve your model while optimizing cost, user experience, and ease of management?

Options:

Import the model into BigQuery ML. Make predictions using batch reading data from BigQuery, and push the data to Cloud SQL

Deploy the model to Vertex AI Prediction. Make predictions using batch reading data from Cloud Bigtable, and push the data to Cloud SQL.

Embed the model in the mobile application. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.

Embed the model in the streaming Dataflow pipeline. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.

Expert Solution

Answer

Explanation

The best option to serve the model while optimizing cost, user experience, and ease of management is to deploy the model to Vertex AI Prediction, which is a managed service that can scale up or down according to the demand and provide low latency and high availability. Vertex AI Prediction can also handle TensorFlow models natively, without requiring any additional steps or conversions. By using batch prediction, the model can process large volumes of data efficiently and periodically, without affecting the user experience. The data can be read from Cloud Bigtable, which is a scalable and performant NoSQL database that can store user data in a flexible schema. The predictions can then be pushed to Cloud SQL, which is a fully managed relational database that can store the predictions in a structured format and enable easy querying and analysis. This option also simplifies the management of the model and the data, as it leverages the existing Google Cloud services and does not require any additional infrastructure or code.

The other options are not optimal for the following reasons:

A. Importing the model into BigQuery ML is not a good option, as it requires converting the TensorFlow model into a format that BigQuery ML can understand, which can introduce errors and reduce the performance. Moreover, BigQuery ML is not designed for serving real-time predictions, but rather for training and evaluating models using SQL queries. Reading and writing data from BigQuery and Cloud SQL can also incur additional costs and latency, as they are both relational databases that require schema definition and data transformation.

C. Embedding the model in the mobile application is not a good option, as it increases the size and complexity of the application, and requires updating the application every time the model changes. Moreover, it exposes the model to the users, which can pose security and privacy risks, as well as potential misuse or abuse. Additionally, it does not leverage the benefits of the cloud, such as scalability, reliability, and performance.

D. Embedding the model in the streaming Dataflow pipeline is not a good option, as it requires building and maintaining a custom pipeline that can handle the model inference and data processing. This can increase the development and operational costs and complexity, as well as the potential for errors and failures. Moreover, it does not take advantage of the batch prediction feature of Vertex AI Prediction, which can optimize the resource utilization and cost efficiency.

[:, Professional ML Engineer Exam Guide, Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate, Google Cloud launches machine learning engineer certification, Vertex AI Prediction documentation, Cloud Bigtable documentation, Cloud SQL documentation, ]

Questions # 84:

You work at an organization that manages a popular payment app. You built a fraudulent transaction detection model by using scikit-learn and deployed it to a Vertex AI endpoint. The endpoint is currently using 1 e2-standard-2 machine with 2 vCPUs and 8 GB of memory. You discover that traffic on the gateway fluctuates to four times more than the endpoint ' s capacity. You need to address this issue by using the most cost-effective approach. What should you do?

Options:

Re-deploy the model with a TPU accelerator.

Increase the number of maximum replicas to 6 nodes, each with 1 e2-standard-2 machine.

Change the machine type to e2-highcpu-32 with 32 vCPUs and 32 GB of memory.

Set up a monitoring job and an alert for CPU usage. If you receive an alert, scale the vCPUs as needed.

Expert Solution

Questions # 85:

You are developing a custom TensorFlow classification model based on tabular data. Your raw data is stored in BigQuery contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names. Your model will be trained over multiple epochs. You want to minimize the effort and cost of your solution. What should you do?

Options:

1 Write a SQL query to create a separate lookup table to scale the numerical features.

2. Deploy a TensorFlow-based model from Hugging Face to BigQuery to encode the text features.

3. Feed the resulting BigQuery view into Vertex Al Training.

1 Use BigQuery to scale the numerical features.

2. Feed the features into Vertex Al Training.

3 Allow TensorFlow to perform the one-hot text encoding.

1 Use TFX components with Dataflow to encode the text features and scale the numerical features.

2 Export results to Cloud Storage as TFRecords.

3 Feed the data into Vertex Al Training.

1 Write a SQL query to create a separate lookup table to scale the numerical features.

2 Perform the one-hot text encoding in BigQuery.

3. Feed the resulting BigQuery view into Vertex Al Training.

Expert Solution

Questions # 86:

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?

Options:

Use Kubeflow Pipelines to execute the experiments Export the metrics file, and query the results using the Kubeflow Pipelines API.

Use Al Platform Training to execute the experiments Write the accuracy metrics to BigQuery, and query the results using the BigQueryAPI.

Use Al Platform Training to execute the experiments Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.

Use Al Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API

Expert Solution

Questions # 87:

You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?

Options:

Set up a TensorFlow Extended (TFX) pipeline on Vertex Al Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.

Set up a Vertex Al Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.

Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.

Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.

Expert Solution

Questions # 88:

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < = 0.8);

CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < = 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Options:

There is training-serving skew in your production environment.

There is not a sufficient amount of training data.

The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.

The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.

Expert Solution

Answer

Viewing page 9 out of 9 pages

Viewing questions 81-90 out of questions

Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Google Machine Learning Engineer Professional-Machine-Learning-Engineer Questions and answers with CertsForce