A company has a Microsoft Foundry generative Al model.
You need to evaluate the model ' s output to measure the overall quality and coherence of generated responses. The evaluation must use GPT-4o as a judge and return a numeric score for each output.
Which type of metric should you use?
Submit