Credit risk assessment AI models trained on unrepresentative datasets perpetuate and amplify historical financial inequities, producing discriminatory outcomes that violate anti-discrimination laws and harm underrepresented borrowers. Dataset diversity is the primary safeguard against training-data-driven bias.
Why A is Correct: According to ISACA AAIR bias and fairness guidance for financial AI, dataset diversity is the most important factor for supporting accurate and unbiased credit risk outcomes. A diverse dataset that represents the full population of potential borrowers—across demographics, income levels, credit histories, and geographies—enables the model to learn genuine risk relationships rather than proxies for protected characteristics. Without diversity, even technically sophisticated models perpetuate discriminatory patterns from historical data.
Why B is Wrong: Supervised learning is a modeling approach, not a data quality characteristic. The choice of supervised learning is appropriate for credit scoring but does not determine whether the training data is representative or unbiased.
Why C is Wrong: Synthetic data augmentation can supplement real data to address specific gaps but cannot substitute for diversity in the underlying real-world data. Synthetic data derived from biased real data may amplify rather than correct the original bias.
Why D is Wrong: Data normalization is a preprocessing technique that scales numerical features to comparable ranges to improve model convergence. It addresses technical modeling quality but has no effect on the representational diversity or demographic fairness of the dataset.
Submit