An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.
Which solution will improve the model’s performance?
A.
Optimize for accuracy. Use image augmentation on the less common images.
B.
Optimize for F1 score. Use image augmentation on the less common images.
C.
Optimize for accuracy. Use SMOTE to generate synthetic images.
D.
Optimize for F1 score. Use SMOTE to generate synthetic images.
This scenario describes a severely imbalanced classification problem. In such cases, accuracy is a misleading metric, because the model can achieve high accuracy by predicting only the majority class.
AWS ML best practices recommend using F1 score (or precision/recall) when evaluating imbalanced datasets. The F1 score balances false positives and false negatives, making it ideal for assessing minority-class performance.
For image data, image augmentation (rotations, flips, crops, color jitter) is the preferred technique to increase minority-class representation. SMOTE is designed for tabular data and is not suitable for image pixel data.
Therefore, the correct solution is to optimize for F1 score and apply image augmentation.
Thus, Option B is the correct and AWS-aligned answer.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit