What metrics would you use to evaluate the performance of a RAG workflow in terms of the accuracy of responses generated in relation to the input query? (Choose two.)
In a Retrieval-Augmented Generation (RAG) workflow, evaluating the accuracy of responses relative to the input query focuses on the quality of the retrieved context and the generated output. As covered in NVIDIA’s Generative AI and LLMs course, two key metrics are response relevancy and context precision. Response relevancy measures how well the generated response aligns with the input query, often assessed through human evaluation or automated metrics like ROUGE or BLEU, ensuring the output is pertinent and accurate. Context precision evaluates the retriever’s ability to fetch relevant documents or passages from the knowledge base, typically measured by metrics like precision@k, which assesses the proportion of retrieved items that are relevant to the query. Options A (generator latency), B (retriever latency), and C (tokens generated per second) are incorrect, as they measure performance efficiency (speed) rather than accuracy. The course notes: “In RAG workflows, response relevancy ensures the generated output matches the query intent, while context precision evaluates the accuracy of retrieved documents, critical for high-quality responses.”
[References: NVIDIA Building Transformer-Based Natural Language Processing Applications course; NVIDIA Introduction to Transformer-Based Natural Language Processing., ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit