next up previous
Next: Genetically guided clustering Up: The Case for Genetic Previous: Introduction

Clustering with HCM and FCM

Consider a set of n vectors $X=\{ {\bf x}_{1}, {\bf x}_{2}, \ldots
{\bf x}_{n}\}$ to be clustered into c groups of like data. Each ${\bf x}_{i} \in R^{s}$ is a feature vector consisting of s real-valued measurements describing the features of the object represented by ${\bf x}_{i}$. The features could be length, width, color etc. Hard or fuzzy clusterings of the objects can be represented by a hard/fuzzy membership matrix called a hard/fuzzy partition. The set of all $c \times n$ non-degenerate hard (or crisp) partition matrices is denoted by Mcn and is defined as

The set of all $c \times n$ non-degenerate constrained fuzzy partition matrices is denoted by Mfcn and is defined as

The clustering criterion used to define good clusters for hard c-means partitions is the HCM function:
\begin{displaymath}
J_{1}(U,V) = \sum_{i=1}^{c} \sum_{k=1}^{n} (U_{ik}) D_{ik}^{2}({\bf
v}_{i},{\bf x}_{k}), \end{displaymath} (1)
where $U \in M_{cn}$ is a hard partition matrix;
$V=[{\bf v}_{1},\ldots,{\bf v}_{c}]$ is a matrix of prototype parameters (cluster centers) $\bf{ v}_{i} \in \Re^{s}, \; \forall i$; and
$D_{ik}({\bf v}_{i},{\bf x}_{k})$ is a measure of the distance from ${\bf x}_{k}$ to the ith cluster prototype. The Euclidean distance metric is used for all HCM results reported here. Good cluster structure in X is taken as a (U,V) minimizer of (2). Typically, optimal (U,V) pairs are sought using an alternating optimization scheme of the type generally described in [5,2].

The clustering criterion used to define good clusters for fuzzy c-means partitions is the FCM function:  
 \begin{displaymath}
J_{m}(U,V) = \sum_{i=1}^{c} \sum_{k=1}^{n} (U_{ik})^{m} D_{ik}^{2}({\bf
v}_{i},{\bf x}_{k}),\end{displaymath} (2)
where $U \in M_{fcn}$ is a fuzzy partition matrix;
$m \in [1,\infty)$ is the weighting exponent on each fuzzy membership;
$V=[{\bf v}_{1},\ldots,{\bf v}_{c}]$ is a matrix of prototype parameters (cluster centers) $\bf{ v}_{i} \in \Re^{s}, \; \forall i$; and
$D_{ik}({\bf v}_{i},{\bf x}_{k})$ is a measure of the distance from ${\bf x}_{k}$ to the ith cluster prototype. The Euclidean distance metric is used for all FCM results reported here. The value m=2 is used for all FCM results given in this paper. The larger m is, the fuzzier the partition. Good cluster structure in X is taken as a (U,V) minimizer of (2). Typically, optimal (U,V) pairs are sought using an alternating optimization scheme of the type generally described in [2].


next up previous
Next: Genetically guided clustering Up: The Case for Genetic Previous: Introduction
Larry Hall
5/26/1998