→ When generating synthetic data, the key concern is ensuring it accurately reflects the characteristics of the real-world population. A non-representative synthetic dataset may lead to biased models and invalid conclusions.
Why the other options are incorrect:
A: Resource usage is a technical concern but not as critical as representativeness.
B: Feature set can often be replicated or engineered — quality matters more.
C: Synthetic datasets can be scaled up easily — representativeness is harder to validate.
Official References:
CompTIA DataX (DY0-001) Study Guide – Section 5.4:“Synthetic data must maintain representational fidelity to the original population in order to be useful for modeling or validation.”
—
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit