Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Professional-Machine-Learning-Engineer Exam Topic 2 Question 10 Discussion:

Question #: 10

Topic #: 2

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

Significantly increase the max_batch_size TensorFlow Serving parameter

Switch to the tensorflow-model-server-universal version of TensorFlow Serving

Significantly increase the max_enqueued_batches TensorFlow Serving parameter

Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes

Get Premium Professional-Machine-Learning-Engineer Questions

Explanation

TensorFlow Serving is a service that allows you to deploy and serve TensorFlow models in a scalable and efficient way. TensorFlow Serving supports various platforms and hardware, such as CPU, GPU, and TPU. However, the default TensorFlow Serving binaries are built with generic CPU instructions, which may not leverage the full potential of the CPU architecture. To improve the serving latency and performance, you can recompile TensorFlow Serving using the source code and enable CPU-specific optimizations, such as AVX, AVX2, and FMA1. These optimizations can speed up the computation and inference of the TensorFlow models, especially for deep neural networks.

Google Kubernetes Engine (GKE) is a service that allows you to run and manage containerized applications on Google Cloud using Kubernetes. GKE supports various types and sizes of nodes, which are the virtual machines that run the containers. GKE also supports different CPU platforms, which are the generations and models of the CPUs that power the nodes. GKE allows you to choose a baseline minimum CPU platform for your node pool, which is a group of nodes with the same configuration. By choosing a baseline minimum CPU platform, you can ensure that your nodes have the CPU features and capabilities that match your workload requirements2.

For the use case of serving a few thousand queries per second and experiencing latency issues, the best option is to recompile TensorFlow Serving using the source to support CPU-specific optimizations, and instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes. This option can improve the serving latency and performance without changing the underlying infrastructure, as it only involves rebuilding the TensorFlow Serving binary and selecting the CPU platform for the GKE nodes. This option can also take advantage of the CPU-only pods that are running on GKE, as it can optimize the CPU utilization and efficiency. Therefore, recompiling TensorFlow Serving using the source to support CPU-specific optimizations and instructing GKE to choose an appropriate baseline minimum CPU platform for serving nodes is the best option for this use case.

References:

Building TensorFlow Serving from source

Specifying a minimum CPU platform for a node pool

Actual exam question for Google Professional-Machine-Learning-Engineer exam by Cyra5790 at Aug 3, 2025, 4:51:30 AM

Contribute your Thoughts:

Chosen Answer: A B C D
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.

New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Correct Answer:

Options Selected by Other Users:

Contribute your Thoughts:

New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Correct Answer:

Options Selected by Other Users:

Contribute your Thoughts:

Awaiting moderator approval