Skip to main content

Command Palette

Search for a command to run...

Hierarchical Clustering

Published
โ€ข5 min read
Hierarchical Clustering

๐Ÿ“Š Hierarchical Clustering (Agglomerative) with Dendrogram

Clustering is one of those concepts in data science that sounds complicated, but once you see how it actually works, it becomes very intuitive.
In this blog, we will clearly understand Hierarchical Clustering โ€“ Agglomerative approach, using a small real example and a dendrogram.


โœจ Introduction โ€“ What is Hierarchical Clustering?

Hierarchical Clustering is a clustering technique that tries to build a hierarchy of clusters.

In Agglomerative Hierarchical Clustering, we start like this:

Every data point is its own cluster.
Then, step by step, the closest clusters are merged โ€” until everything becomes one big cluster.

In short:

bottom โ†’ top merging process

๐Ÿ‘‰ Thoda simple mein bole toh:
pehle sab alag-alag, phir dheere-dheere milte jaate hain.

This whole merging process is visualized using a special tree-like diagram called a dendrogram.


๐ŸŒณ What is a Dendrogram?

A dendrogram shows:

  • which clusters were merged,

  • in what order they were merged,

  • and at what distance they were merged.

It helps us visually decide:
๐Ÿ‘‰ how many clusters we really want.


๐Ÿ” Step-by-Step Process of Dendrogram Creation

Letโ€™s convert the theory into a clear flow:

  • Each data point is treated as its own cluster.

  • Compute distances between all clusters.

  • Find the two closest clusters.

  • Merge them into one cluster.

  • Update the distance matrix.

  • Repeat this process.

  • Continue until all points become one cluster.

  • The dendrogram shows the full merge history.

  • A horizontal cut on the dendrogram decides the final number of clusters.


๐Ÿ”— Linkage Methods (How distance between clusters is measured)

When clusters have more than one point, the question becomes:

โ€œHow do we define distance between two clusters?โ€

That is done using linkage methods.

1๏ธโƒฃ Single Linkage (Nearest neighbour)

Distance between two clusters =
minimum distance between any two points from both clusters.

๐Ÿ‘‰ Nearest pair decides.


2๏ธโƒฃ Complete Linkage (Farthest neighbour)

Distance between two clusters =
maximum distance between any two points from both clusters.

๐Ÿ‘‰ Farthest pair decides.


3๏ธโƒฃ Average Linkage

Distance between two clusters =
average of all pairwise distances between points in the two clusters.

๐Ÿ‘‰ Thoda balanced approach.


โ™จ๏ธ Worked Example (Very Important for Understanding)

Letโ€™s use your dataset.

Dataset โ€“ 5 points in 2D

  • A = (1, 1)

  • B = (2, 1)

  • C = (4, 3)

  • D = (5, 4)

  • E = (3, 4)


๐Ÿ“ Pairwise Euclidean Distances

We already have the distances:

  • d(A, B) = 1.000

  • d(A, C) = 3.606

  • d(A, D) = 5.000

  • d(A, E) = 3.606

  • d(B, C) = 2.828

  • d(B, D) = 4.243

  • d(B, E) = 3.162

  • d(C, D) = 1.414

  • d(C, E) = 1.414

  • d(D, E) = 2.000

Now letโ€™s see how merging happens.


๐Ÿ”น 1๏ธโƒฃ Single Linkage โ€“ Merge Order

  • (A, B) merge at 1.000

  • (C, D) merge at 1.414

  • (CD, E) merge at 1.414

  • (AB, CDE) merge at 2.828

What is really happening here?

  • A and B are closest โ†’ merged first

  • C and D are next closest

  • E is closest to either C or D โ†’ joins them

  • finally, AB joins the big cluster

๐Ÿ‘‰ Single linkage focuses only on the nearest points, not the whole cluster shape.


๐Ÿ”น 2๏ธโƒฃ Complete Linkage โ€“ Merge Order

  • (A, B) merge at 1.000

  • (C, D) merge at 1.414

  • (CD, E) merge at 2.000

  • (AB, CDE) merge at 5.000

Key difference you should notice

Here, the last merge happens at 5.000 (not 2.828 like single linkage).

๐Ÿ‘‰ Because complete linkage looks at the farthest distance between clusters.

This often produces compact and tight clusters.


๐Ÿงฎ Interpreting the Dendrogram

When you see a dendrogram:

  • Leaves = original data points

  • Height of a merge = distance at which merge happened

  • Horizontal cut = number of clusters

  • Lower merge height = more similar points

๐Ÿ‘‰ Ek simple rule yaad rakho:
jitna neeche merge, utna zyada similar.


โœ… Advantages of Hierarchical Clustering

  • You do not need to pre-define the number of clusters.

  • You get a full hierarchy of clusters.

  • Very useful for exploratory data analysis.


โš ๏ธ Limitations

  • Computationally expensive for large datasets.

  • Sensitive to noise and outliers.

  • Different linkage methods can give different results.

  • Once merged, clusters cannot be split again.

๐Ÿ‘‰ Matlab, galti ho gayi toh undo nahi hota.


๐Ÿ› ๏ธ Practical Considerations

  • Always standardize features when using Euclidean distance.

  • Try multiple linkage methods before finalizing.

  • Cluster quality can be validated using:

    • silhouette score

    • cophenetic correlation

In practice, data scientists often use libraries such as
scikit-learn
and
SciPy
to compute hierarchical clustering and to draw dendrograms.


๐ŸŒ Real-World Applications

Letโ€™s connect this technique with real industries.


๐Ÿฆ Banking & Finance

  • Customer segmentation using income, spending, and investment behaviour

  • Fraud detection by identifying unusual transaction patterns

  • Risk grouping of loan applicants

๐Ÿ‘‰ Bank ko samajhna hota hai: kaun safe hai, kaun risky.


๐Ÿ›๏ธ Retail & E-commerce

  • Market segmentation (budget buyers, premium buyers, etc.)

  • Product recommendation

  • Store clustering for region-wise marketing


๐Ÿฅ Healthcare & Life Sciences

  • Patient grouping using medical history

  • Disease subtype discovery using genetic or symptom data

  • Drug discovery by clustering similar compounds


๐Ÿ“ก Telecommunications

  • Churn analysis by usage behaviour

  • Network optimisation

  • Personalized plans for different customer groups


๐ŸŒ Marketing & Advertising

  • Targeted campaigns

  • Brand positioning analysis

  • Social media audience and influencer grouping


๐Ÿ“˜ A Small Learning Tip (Optional but Helpful)

If you want to go deeper into the theoretical foundation of clustering and pattern recognition, a well-known reference book is:

Pattern Recognition and Machine Learning


๐Ÿง  Final Thoughts & Conclusion

Hierarchical clustering, especially the agglomerative approach, is extremely powerful when:

  • you do not know the correct number of clusters,

  • you want to visually explore your data,

  • and you want to understand the structure inside your dataset.

Through our small example of five points (A, B, C, D, E), you clearly saw:

  • how clusters are merged step by step,

  • how different linkage methods change the final structure,

  • and how a dendrogram helps you decide the final grouping.

Hierarchical clustering is not only about forming clusters โ€” it is about understanding relationships between data points.