Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Question # 10 Topic 2 Discussion

Professional-Machine-Learning-Engineer Exam Topic 2 Question 10 Discussion:
Question #: 10
Topic #: 2

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?


A.

Significantly increase the max_batch_size TensorFlow Serving parameter


B.

Switch to the tensorflow-model-server-universal version of TensorFlow Serving


C.

Significantly increase the max_enqueued_batches TensorFlow Serving parameter


D.

Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes


Get Premium Professional-Machine-Learning-Engineer Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.