Isaca ISACA Advanced in AI Audit (AAIA) AAIA Question # 15 Topic 2 Discussion
AAIA Exam Topic 2 Question 15 Discussion:
Question #: 15
Topic #: 2
An IS auditor notes the combined number of records utilized within the training, validation, and testing data sets exceeds the total number of records in the original data set. Which of the following is MOST important for the auditor to determine?
A.
Whether the training, validation, and testing data sets were created in the correct order
B.
Whether data leakage occurred from utilizing overlapping records in the data sets
C.
Whether a sufficient number of records were utilized in the training data set
D.
Whether the validation data set utilized the same number of records as the training data sets
If the combined size of the training, validation, and testing sets exceeds the original data size, it suggests that records may have been reused across sets. This can lead to data leakage, where the model has access to test or validation information during training, resulting in overly optimistic performance metrics.
“Data leakage invalidates model evaluation because it introduces unintended data overlap. Auditors must ensure that the training, validation, and test sets are strictly partitioned.”
Options A, C, and D refer to process order or quantity, but only B addresses the root issue of compromised model integrity due to overlapping data.
[Reference: ISACA Advanced in AI Audit™ (AAIA™) Study Guide, Section: “AI Fundamentals and Technologies,” Subsection: “Data Partitioning and Leakage Risks”, ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit