Comprehensive and Detailed Explanation From Exact AWS AI documents:
In Amazon Bedrock evaluations, Completeness measures how thoroughly a model or agent response addresses all aspects of the user prompt.
AWS evaluation guidance for LLM-as-a-judge explains that:
Completeness focuses on coverage of prompt requirements
It is especially useful for evaluating multi-part questions
It is a built-in qualitative metric in agent evaluation workflows
Why the other options are incorrect:
ROUGE (A) measures text overlap, mainly for summarization.
Following instructions (C) evaluates adherence, not coverage.
Refusal (D) measures appropriate refusal behavior.
AWS AI document references:
Amazon Bedrock Model Evaluation
LLM-as-a-Judge Metrics
Evaluating Agent Responses on AWS
Submit