I use the DBSCAN implementation from the library and I get weird results. The number of estimated clusters increased with the increase of the Minpe (mine_splell) parameter, and it should not be happy with my understanding of algorithms.
My results are:
Estimated number of groups: 34 eps = 0.9 min_samples = 13.0 Estimated number of groups: 35 eps = 0.9 min_samples = 12.0 Estimated number of groups: 42 eps = 0.9 min_samples = 11.0 & lt; - Strange results here Estimated number of cluster: 37 EPS = 0.9 min_samples = 10.0 Estimated number of groups: 53 EPS = 0.9 min_samples = 9.0 Estimated number of groups: 63 eps = 0.9 min_samples = 8.0
I like scikit-learn:
X = StandardScaler (). Fit_transform (x) db = dbSCAN (eps = eps, min_samples = min_samples), algorithm = 'kd_tree'). Fit (x)
and X is an array that contains 12-dimensional digits of ~ 200.
What can be the problem here
DBSCAN points / samples in three categories Splits:
- Corps: lives in a dense neighborhood and therefore can lead to a cluster.
- Density-reachable: To be part of your cluster is enough near the core point.
- Outliers: Everything else.
Now, as you need a condensed neighborhood for core points, you get fewer core points, but a key point to lose its position is x There can be three effects on the basis of density outside your neighborhood:
- x is still densely accessible from the main points of its former cluster and the remaining The main points are able to hold the cluster together. The number of clusters is unchanged.
- x is still densible-accessible from at least two core points, but density-connecting between core points no longer works as a "bridge" , Which gave them an opportunity to make separate cluster. The number of groups increases and x is assigned to another point cluster.
- x , neither its neighboring point is able to retain its pre-cluster and it disappears, x is an outlier form Leaving the number of clusters decreased.
Comments
Post a Comment