16.8 C
London
Sunday, March 9, 2025
HomeStatistics TutorialStatologyWhat is the Rand Index? (Definition & Examples)

What is the Rand Index? (Definition & Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The Rand index is a way to compare the similarity of results between two different clustering methods.

Often denoted R, the Rand Index is calculated as:

R = (a+b) / (nC2)

where:

  • a: The number of times a pair of elements belongs to the same cluster across two clustering methods.
  • b: The number of times a pair of elements belong to difference clusters across two clustering methods.
  • nC2: The number of unordered pairs in a set of n elements.

The Rand index always takes on a value between 0 and 1 where:

  • 0: Indicates that two clustering methods do not agree on the clustering of any pair of elements.
  • 1: Indicates that two clustering methods perfectly agree on the clustering of every pair of elements.

The following example illustrates how to calculate the Rand index between two clustering methods for a simple dataset.

Example: How to Calculate the Rand Index

Suppose we have the following dataset of five elements:

  • Dataset: {A, B, C, D, E}

And suppose we use two clustering methods that place each element in the following clusters:

  • Method 1 Clusters: {1, 1, 1, 2, 2}
  • Method 2 Clusters: {1, 1, 2, 2, 3}

To calculate the Rand index between these clustering methods, we need to first write out every possible unordered pair in the dataset of five elements:

  • Unordered pairs: {A, B}, {A, C}, {A, D}, {A, E}, {B, C}, {B, D}, {B, E}, {C, D}, {C, E}, {D, E}

There are 10 unordered pairs.

Next, we need to calculate a, which represents the number of unordered pairs that belong to the same cluster across both clustering methods:

  • {A, B}

In this case, a = 1.

Next, we need to calculate b, which represents the number of unordered pairs that belong to different clusters across both clustering methods:

  • {A, D}, {A, E}, {B, D}, {B, E}, {C, E}

In this case, b = 5.

Lastly, we can calculate the Rand index as:

  • R = (a+b) / (nC2)
  • R = (1+5) / 10
  • R = 6/10

The Rand index is 0.6.

How to Calculate the Rand Index in R

We can use the rand.index() function from the fossil package to calculate the Rand index between two clustering methods in R:

library(fossil)

#define clusters
method1 #calculate Rand index between clustering methods
rand.index(method1, method2)

[1] 0.6

The Rand index is 0.6. This matches the value that we calculated by hand.

How to Calculate the Rand Index in Python

We can define the following function in Python to calculate the Rand index between two clusters:

import numpy as np
from scipy.special import comb

#define Rand index function
def rand_index(actual, pred):

    tp_plus_fp = comb(np.bincount(actual), 2).sum()
    tp_plus_fn = comb(np.bincount(pred), 2).sum()
    A = np.c_[(actual, pred)]
    tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
             for i in set(actual))
    fp = tp_plus_fp - tp
    fn = tp_plus_fn - tp
    tn = comb(len(A), 2) - tp - fp - fn
    return (tp + tn) / (tp + fp + fn + tn)

#calculate Rand index
rand_index([1, 1, 1, 2, 2], [1, 1, 2, 2, 3])

0.6

The Rand index turns out to be 0.6. This matches the value calculated in the previous examples.

Additional Resources

An Introduction to K-Means Clustering
An Introduction to K-Medoids Clustering
An Introduction to Hierarchical Clustering

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories