According to AAISM study guidance, the most direct and effective way to obtain large volumes of diverse data for application testing is through open-source data repositories. These repositories provide freely available, well-documented, and often standardized data that supports testing and benchmarking in a compliant manner. Model cards document AI behavior but do not provide data. Incorporating search content may introduce legal, privacy, and quality risks. Data augmentation is useful for expanding existing sets but does not provide the breadth or size required when starting with insufficient data. The recommended best practice for sourcing large testing datasets is therefore the use of open-source repositories.
[References:, AAISM Study Guide – AI Technologies and Controls (Data Sources and Testing Practices), ISACA AI Security Management – Data Governance and Compliance in AI Testing, ]
Submit