📊 Hierarchical Clustering (Agglomerative) with Dendrogram

Clustering is one of those concepts in data science that sounds complicated, but once you see how it actually works, it becomes very intuitive.
In this blog, we will clearly understand Hierarchical Clustering – Agglomerative approach, using a small real example and a dendrogram.

✨ Introduction – What is Hierarchical Clustering?

Hierarchical Clustering is a clustering technique that tries to build a hierarchy of clusters.

In Agglomerative Hierarchical Clustering, we start like this:

Every data point is its own cluster.
Then, step by step, the closest clusters are merged — until everything becomes one big cluster.

In short:

bottom → top merging process

👉 Thoda simple mein bole toh:
pehle sab alag-alag, phir dheere-dheere milte jaate hain.

This whole merging process is visualized using a special tree-like diagram called a dendrogram.

🌳 What is a Dendrogram?

A dendrogram shows:

which clusters were merged,
in what order they were merged,
and at what distance they were merged.

It helps us visually decide:
👉 how many clusters we really want.

🔁 Step-by-Step Process of Dendrogram Creation

Let’s convert the theory into a clear flow:

Each data point is treated as its own cluster.
Compute distances between all clusters.
Find the two closest clusters.
Merge them into one cluster.
Update the distance matrix.
Repeat this process.
Continue until all points become one cluster.
The dendrogram shows the full merge history.
A horizontal cut on the dendrogram decides the final number of clusters.

🔗 Linkage Methods (How distance between clusters is measured)

When clusters have more than one point, the question becomes:

“How do we define distance between two clusters?”

That is done using linkage methods.

1️⃣ Single Linkage (Nearest neighbour)

Distance between two clusters =
minimum distance between any two points from both clusters.

👉 Nearest pair decides.

2️⃣ Complete Linkage (Farthest neighbour)

Distance between two clusters =
maximum distance between any two points from both clusters.

👉 Farthest pair decides.

3️⃣ Average Linkage

Distance between two clusters =
average of all pairwise distances between points in the two clusters.

👉 Thoda balanced approach.

♨️ Worked Example (Very Important for Understanding)

Let’s use your dataset.

Dataset – 5 points in 2D

A = (1, 1)
B = (2, 1)
C = (4, 3)
D = (5, 4)
E = (3, 4)

📏 Pairwise Euclidean Distances

We already have the distances:

d(A, B) = 1.000
d(A, C) = 3.606
d(A, D) = 5.000
d(A, E) = 3.606
d(B, C) = 2.828
d(B, D) = 4.243
d(B, E) = 3.162
d(C, D) = 1.414
d(C, E) = 1.414
d(D, E) = 2.000

Now let’s see how merging happens.

🔹 1️⃣ Single Linkage – Merge Order

(A, B) merge at 1.000
(C, D) merge at 1.414
(CD, E) merge at 1.414
(AB, CDE) merge at 2.828

What is really happening here?

A and B are closest → merged first
C and D are next closest
E is closest to either C or D → joins them
finally, AB joins the big cluster

👉 Single linkage focuses only on the nearest points, not the whole cluster shape.

🔹 2️⃣ Complete Linkage – Merge Order

(A, B) merge at 1.000
(C, D) merge at 1.414
(CD, E) merge at 2.000
(AB, CDE) merge at 5.000

Key difference you should notice

Here, the last merge happens at 5.000 (not 2.828 like single linkage).

👉 Because complete linkage looks at the farthest distance between clusters.

This often produces compact and tight clusters.

🧮 Interpreting the Dendrogram

When you see a dendrogram:

Leaves = original data points
Height of a merge = distance at which merge happened
Horizontal cut = number of clusters
Lower merge height = more similar points

👉 Ek simple rule yaad rakho:
jitna neeche merge, utna zyada similar.

✅ Advantages of Hierarchical Clustering

You do not need to pre-define the number of clusters.
You get a full hierarchy of clusters.
Very useful for exploratory data analysis.

⚠️ Limitations

Computationally expensive for large datasets.
Sensitive to noise and outliers.
Different linkage methods can give different results.
Once merged, clusters cannot be split again.

👉 Matlab, galti ho gayi toh undo nahi hota.

🛠️ Practical Considerations

Always standardize features when using Euclidean distance.
Try multiple linkage methods before finalizing.
Cluster quality can be validated using:
- silhouette score
- cophenetic correlation

In practice, data scientists often use libraries such as
scikit-learn
and
SciPy
to compute hierarchical clustering and to draw dendrograms.

🌍 Real-World Applications

Let’s connect this technique with real industries.

🏦 Banking & Finance

Customer segmentation using income, spending, and investment behaviour
Fraud detection by identifying unusual transaction patterns
Risk grouping of loan applicants

👉 Bank ko samajhna hota hai: kaun safe hai, kaun risky.

🛍️ Retail & E-commerce

Market segmentation (budget buyers, premium buyers, etc.)
Product recommendation
Store clustering for region-wise marketing

🏥 Healthcare & Life Sciences

Patient grouping using medical history
Disease subtype discovery using genetic or symptom data
Drug discovery by clustering similar compounds

📡 Telecommunications

Churn analysis by usage behaviour
Network optimisation
Personalized plans for different customer groups

🌐 Marketing & Advertising

Targeted campaigns
Brand positioning analysis
Social media audience and influencer grouping

📘 A Small Learning Tip (Optional but Helpful)

If you want to go deeper into the theoretical foundation of clustering and pattern recognition, a well-known reference book is:

Pattern Recognition and Machine Learning

🧠 Final Thoughts & Conclusion

Hierarchical clustering, especially the agglomerative approach, is extremely powerful when:

you do not know the correct number of clusters,
you want to visually explore your data,
and you want to understand the structure inside your dataset.

Through our small example of five points (A, B, C, D, E), you clearly saw:

how clusters are merged step by step,
how different linkage methods change the final structure,
and how a dendrogram helps you decide the final grouping.

Hierarchical clustering is not only about forming clusters — it is about understanding relationships between data points.

Command Palette