Hierarchical Clustering

๐ Hierarchical Clustering (Agglomerative) with Dendrogram
Clustering is one of those concepts in data science that sounds complicated, but once you see how it actually works, it becomes very intuitive.
In this blog, we will clearly understand Hierarchical Clustering โ Agglomerative approach, using a small real example and a dendrogram.
โจ Introduction โ What is Hierarchical Clustering?
Hierarchical Clustering is a clustering technique that tries to build a hierarchy of clusters.
In Agglomerative Hierarchical Clustering, we start like this:
Every data point is its own cluster.
Then, step by step, the closest clusters are merged โ until everything becomes one big cluster.
In short:
bottom โ top merging process
๐ Thoda simple mein bole toh:
pehle sab alag-alag, phir dheere-dheere milte jaate hain.
This whole merging process is visualized using a special tree-like diagram called a dendrogram.
๐ณ What is a Dendrogram?
A dendrogram shows:
which clusters were merged,
in what order they were merged,
and at what distance they were merged.
It helps us visually decide:
๐ how many clusters we really want.
๐ Step-by-Step Process of Dendrogram Creation
Letโs convert the theory into a clear flow:
Each data point is treated as its own cluster.
Compute distances between all clusters.
Find the two closest clusters.
Merge them into one cluster.
Update the distance matrix.
Repeat this process.
Continue until all points become one cluster.
The dendrogram shows the full merge history.
A horizontal cut on the dendrogram decides the final number of clusters.
๐ Linkage Methods (How distance between clusters is measured)
When clusters have more than one point, the question becomes:
โHow do we define distance between two clusters?โ
That is done using linkage methods.
1๏ธโฃ Single Linkage (Nearest neighbour)
Distance between two clusters =
minimum distance between any two points from both clusters.
๐ Nearest pair decides.
2๏ธโฃ Complete Linkage (Farthest neighbour)
Distance between two clusters =
maximum distance between any two points from both clusters.
๐ Farthest pair decides.
3๏ธโฃ Average Linkage
Distance between two clusters =
average of all pairwise distances between points in the two clusters.
๐ Thoda balanced approach.
โจ๏ธ Worked Example (Very Important for Understanding)
Letโs use your dataset.
Dataset โ 5 points in 2D
A = (1, 1)
B = (2, 1)
C = (4, 3)
D = (5, 4)
E = (3, 4)
๐ Pairwise Euclidean Distances
We already have the distances:
d(A, B) = 1.000
d(A, C) = 3.606
d(A, D) = 5.000
d(A, E) = 3.606
d(B, C) = 2.828
d(B, D) = 4.243
d(B, E) = 3.162
d(C, D) = 1.414
d(C, E) = 1.414
d(D, E) = 2.000
Now letโs see how merging happens.
๐น 1๏ธโฃ Single Linkage โ Merge Order
(A, B) merge at 1.000
(C, D) merge at 1.414
(CD, E) merge at 1.414
(AB, CDE) merge at 2.828
What is really happening here?
A and B are closest โ merged first
C and D are next closest
E is closest to either C or D โ joins them
finally, AB joins the big cluster
๐ Single linkage focuses only on the nearest points, not the whole cluster shape.
๐น 2๏ธโฃ Complete Linkage โ Merge Order
(A, B) merge at 1.000
(C, D) merge at 1.414
(CD, E) merge at 2.000
(AB, CDE) merge at 5.000
Key difference you should notice
Here, the last merge happens at 5.000 (not 2.828 like single linkage).
๐ Because complete linkage looks at the farthest distance between clusters.
This often produces compact and tight clusters.
๐งฎ Interpreting the Dendrogram
When you see a dendrogram:
Leaves = original data points
Height of a merge = distance at which merge happened
Horizontal cut = number of clusters
Lower merge height = more similar points
๐ Ek simple rule yaad rakho:
jitna neeche merge, utna zyada similar.
โ Advantages of Hierarchical Clustering
You do not need to pre-define the number of clusters.
You get a full hierarchy of clusters.
Very useful for exploratory data analysis.
โ ๏ธ Limitations
Computationally expensive for large datasets.
Sensitive to noise and outliers.
Different linkage methods can give different results.
Once merged, clusters cannot be split again.
๐ Matlab, galti ho gayi toh undo nahi hota.
๐ ๏ธ Practical Considerations
Always standardize features when using Euclidean distance.
Try multiple linkage methods before finalizing.
Cluster quality can be validated using:
silhouette score
cophenetic correlation
In practice, data scientists often use libraries such as
scikit-learn
and
SciPy
to compute hierarchical clustering and to draw dendrograms.
๐ Real-World Applications
Letโs connect this technique with real industries.
๐ฆ Banking & Finance
Customer segmentation using income, spending, and investment behaviour
Fraud detection by identifying unusual transaction patterns
Risk grouping of loan applicants
๐ Bank ko samajhna hota hai: kaun safe hai, kaun risky.
๐๏ธ Retail & E-commerce
Market segmentation (budget buyers, premium buyers, etc.)
Product recommendation
Store clustering for region-wise marketing
๐ฅ Healthcare & Life Sciences
Patient grouping using medical history
Disease subtype discovery using genetic or symptom data
Drug discovery by clustering similar compounds
๐ก Telecommunications
Churn analysis by usage behaviour
Network optimisation
Personalized plans for different customer groups
๐ Marketing & Advertising
Targeted campaigns
Brand positioning analysis
Social media audience and influencer grouping
๐ A Small Learning Tip (Optional but Helpful)
If you want to go deeper into the theoretical foundation of clustering and pattern recognition, a well-known reference book is:
Pattern Recognition and Machine Learning
๐ง Final Thoughts & Conclusion
Hierarchical clustering, especially the agglomerative approach, is extremely powerful when:
you do not know the correct number of clusters,
you want to visually explore your data,
and you want to understand the structure inside your dataset.
Through our small example of five points (A, B, C, D, E), you clearly saw:
how clusters are merged step by step,
how different linkage methods change the final structure,
and how a dendrogram helps you decide the final grouping.
Hierarchical clustering is not only about forming clusters โ it is about understanding relationships between data points.