Initialization of K-means

K-means results are highly dependent on the initialization procedure used.

The ENVI toolkit differentiates between isodata and k-means based on initialization only: k-means : distribute means, isodata : distribute pixels.

There are a number of different ways to initialize the algorithm:

  1. Arbitrarily assign classes to pixels

    The straightforward way to do this is to assign the ith pixel to the i modulo kth class.

    This is a good approach when the number of channels is greater than the number of classes. You may need a large number of bits of precision just for the first few iterations -- to ensure that there is some distance between the different means. For this reason, it works better in software than hardware.

  2. Distribute the mean table around the color space

    This is a good approach if the number of channels is less than the number of classes. Otherwise it is difficult to distribute the means.

  3. Initialize mean values with random pixels from the image.

    To initialize k classes, choose k pixels at random from the image. (Note -- they don't need to be that random.) Make sure that the pairwise distance between the k distance is large enough.

    How to ensure that 2 pixels are sufficiently far away from each other ? One (compute intensive) way to do this is to choose p pixels at random from the image ( where p >> k but smaller than all the pixels in the image) and then do k means clustering on those p pixels. (But how do you initialize the sub-problem ? Arbitrarily assigning pixels to classes should work for this.)



This page is maintained by Prof. Miriam Leeser
Last updated September 16, 1999.
Email: mel@ece.neu.edu