A company uses ML models to predict whether transactions are fraudulent. The company needs to identify as many fraudulent transactions as possible. Which evaluation metric should the company use to evaluate the models to meet this requirement?
Option D is correct because the company’s primary goal is to identify as many fraudulent transactions as possible. In AWS documentation, recall is defined as TP / (TP + FN), where TP is true positives and FN is false negatives. Recall measures how well a model finds all actual positive cases. In a fraud-detection setting, the “positive” class is the fraudulent transaction, so maximizing recall means minimizing missed fraud cases.
AWS documentation also explains the business tradeoff clearly: a use case that needs to correctly predict as many positive examples as possible should prioritize high recall, even if that means accepting some additional false positives and therefore only moderate precision. That matches this question exactly, because the requirement is not to be most selective or most balanced overall; it is specifically to catch the largest possible number of fraudulent transactions.
The other metrics are less aligned to the stated goal. Precision focuses on how many predicted fraud cases are actually fraud, which is most important when false positives are very costly. F1 score balances precision and recall, but the question does not ask for balance; it asks for finding as many fraudulent transactions as possible. AUC is useful for overall ranking and threshold-independent model discrimination, but it is not the most direct metric for optimizing missed-fraud detection in this scenario. Based on AWS metric definitions, when the cost of missing true positives is the main concern, recall is the best evaluation metric. Therefore, the best verified answer is D.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit