8.8 Evaluating Clustering: Internal and External Metrics
Right, so you’ve thrown some data into a clustering algorithm and it gave you back… some clusters. Great. Now for the million-dollar question: are they any good? Or did you just perform a very expensive, automated version of sorting marbles by color while blindfolded? This is where evaluation comes in, and it’s arguably more art than science. We have two families of metrics to help us: internal and external. Internal metrics don’t need the ground truth labels; they judge a cluster by its own structure. External metrics require the actual labels (which, let’s be honest, if you had those you might not be clustering in the first place) and measure how well our clusters match the known classes.