You are creating a retraining policy for a customer churn prediction model deployed in Vertex AI. New training data is added weekly. You want to implement a model retraining process that minimizes cost and effort. What should you do?
A.
Retrain the model when the model ' s latency increases by 10% due to increased traffic.
B.
Retrain the model when the model accuracy drops by 10% on the new training dataset.
C.
Retrain the model every week when new training data is available.
D.
Retrain the model when a significant shift in the distribution of customer attributes is detected in the production data compared to the training data.
In the context of MLOps on Google Cloud and Vertex AI, the goal is to balance model performance with operational efficiency. Here is why Option D is the correct strategy for minimizing cost and effort while maintaining reliability:
Data Drift and Model Decay: In production environments, the distribution of live data often changes over time (a phenomenon known as Training-Serving Skew or Data Drift ). If the customer attributes in the real world no longer match the data the model was trained on, the model’s predictive power will degrade.
Vertex AI Model Monitoring: Vertex AI provides built-in tools to monitor for Feature Attribution Drift and Training-Serving Skew . By setting up alerts for these shifts, you implement " Performance-based " or " Condition-based " retraining. This is more cost-effective than retraining every week (Option C), which might use expensive compute resources to retrain a model that is still performing perfectly.
Why other options are incorrect:
Option A: Latency is an infrastructure/engineering metric, not a predictive quality metric. Retraining the model will not fix latency issues caused by high traffic; that would require scaling your prediction nodes.
Option B: While accuracy is important, waiting for a 10% drop on a new dataset often means the model has already been underperforming in production for some time. Furthermore, calculating accuracy requires " ground truth " (actual labels), which may not be available immediately for churn.
Option C: Retraining weekly regardless of performance leads to unnecessary compute costs and engineering overhead if the data hasn ' t changed significantly.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit