iSQI ISTQB Certified Tester Testing with Generative AI (CT-GenAI) v1.0 CT-GenAI Question # 1 Topic 1 Discussion
CT-GenAI Exam Topic 1 Question 1 Discussion:
Question #: 1
Topic #: 1
A team notices vague, inconsistent LLM outputs for the same story for two different prompts. Which technique BEST helps choose the stronger wording among two prompt versions using predefined metrics?
A/B testing, also known as split testing, is a systematic empirical method used to compare two versions of a prompt (Version A and Version B) to determine which one performs better based on predefined evaluation metrics. In the realm of LLMs, where outputs can be stochastic (probabilistic), A/B testing is essential for mitigating inconsistency. When a team encounters vague or varying results for a user story, simply modifying the prompt iteratively (Option B) may improve the result but does not provide a statistical or objective basis for why one version is superior. Byrunning A/B tests, testers can evaluate prompts against specific KPIs such as accuracy, relevance, format adherence, or the absence of hallucinations. This process involves sending the same input data through both prompt versions multiple times and scoring the outputs. The version that consistently yields the "stronger wording" or more precise testware is then selected as the production standard. This data-driven approach is a cornerstone of prompt engineering in professional environments, ensuring that the most effective linguistic structures are utilized to maximize the model's performance and reliability.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit