18/20
DBSCAN β€” Density-Based Clustering Β· Page 1 of 1

Beyond K-Means

DBSCAN β€” Density-Based Clustering

Problem with K-Means

K-Means assumes:

  • Clusters are roughly spherical
  • You know K (number of clusters) in advance
  • All clusters have similar sizes

Counter-examples:

  • Crescent-shaped clusters
  • Clusters of different sizes
  • Outliers shouldn't be in any cluster
  • Unknown number of clusters

DBSCAN (Density-Based Spatial Clustering)

Core Idea

Two points are in the same cluster if they're close together and surrounded by other close points.

How it works:

  1. Ξ΅ (epsilon): How close is "close"? (distance threshold)
  2. minPts: Minimum neighbors within Ξ΅ to be a core point
  3. For each point:
    • If >= minPts points within Ξ΅ β†’ Core point
    • If close to core point β†’ Border point
    • Otherwise β†’ Noise/Outlier

Parameters:

  • Ξ΅: Too small = all noise. Too large = one big cluster. Use k-distance graph.
  • minPts: Often 2Γ—dimensions or 4. Balance sensitivity.

Advantages over K-Means

Pros:

  • βœ“ No need to specify K (discovers automatically)
  • βœ“ Finds arbitrary-shaped clusters
  • βœ“ Detects outliers (noise points)
  • βœ“ Robust to outliers

Cons:

  • βœ— Sensitive to Ξ΅ and minPts (must tune)
  • βœ— Slower than K-Means (O(nΒ²))
  • βœ— Struggles with varying cluster densities
  • βœ— High-dimensional curse (distances become less meaningful)

Choosing Ξ΅

Use the k-distance graph:

  1. For each point, calculate distance to kth nearest neighbor
  2. Sort distances
  3. Plot as line graph
  4. Find "elbow" where distances increase sharply
  5. That's your Ξ΅

DBSCAN vs K-Means

AspectK-MeansDBSCAN
Cluster shapeSphericalAny shape
OutliersForced into clusterLabeled as noise
K required?YesNo
ScalabilityFastSlow
Parameter tuningSimple (just K)Complex (Ξ΅, minPts)
Use caseBalanced, round clustersArbitrary shapes, unknown K
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…