A data scientist is clustering a data set but does not want to specify the number of clusters present. Which of the following algorithms should the data scientist use?
→ DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that does not require specifying the number of clusters in advance. It identifies clusters of arbitrary shape and separates noise/outliers based on density thresholds.
Why other options are incorrect:
B: k-NN is a supervised classification algorithm, not used for clustering.
C: k-means requires predefining the number of clusters (k).
D: Logistic regression is a classification model, not for clustering.
Official References:
CompTIA DataX (DY0-001) Study Guide – Section 4.2:“DBSCAN detects clusters based on data density without the need for a predefined k value and handles outliers effectively.”
—
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit