next up previous
Next: Selection Up: The Case for Genetic Previous: Clustering with HCM and

Genetically guided clustering

In any generation, element i of the population is Vi, a $c\times s$ matrix of cluster centers in FCM/HCM notation. The initial population of size P is constructed by random assignment of real numbers to each of the s features of the c cluster centers. The initial values are constrained to be in the range (determined from the data set) of the feature to which they are assigned, but are otherwise random.

Since only the V's will be used within the GA it is necessary to reformulate the objective functions (3) and (4) for optimization. For HCM each data vector is assigned to the nearest cluster via some distance metric (Euclidean here). Given the way assignments to clusters are made, it is easy to see that
\begin{displaymath}
R_1(V) = \sum_{k=1}^{n} min\{D_{1k}, D_{2k}, \cdots, D_{ck}\}\end{displaymath} (3)
is an equivalent reformulation of J1 which eliminates U.

In order to work only with V's in FCM, equation (4) can be rewritten by substitution for U using the first order necessary condition for U [2]. Specifically, for m > 1 as long as $D_{jk}({\bf
v}_j,{\bf x}_{k})
\gt 0 \quad \forall \; j,k$, we can substitute


into (2), resulting in the reformulated FCM functional

\begin{displaymath}
R_m(V) = \sum_{k=1}^{n} {\left ( \sum_{i=1}^c D_{ik}^{1/(1-m)} \right )}^{1-m}\end{displaymath} (4)

Hathaway and Bezdek [6] have shown that local (V) minimizers of Rm and (U) at (3) produce local minimizers of Jm, and conversely, the V part of local minimizers of Jm yields local minimizers of Rm. Our approach will be to optimize Rm with a GGA. The reformulation theorem given in [6] provides theoretical justification for this technique.

The genetic guided algorithm is shown in Figure 1. It consists of selecting parents for reproduction, doing crossover with the parents and applying mutation to the bits of the children.

In the process of experimenting with hard c-means we occasionally observed GGA getting caught in an extrema associated with a degenerate partition (i.e a partition with one or more empty rows meaning that less than c clusters were obtained in the final partition). A row i is considered empty if $u_{ik} \le .00001,\; \forall k$. To minimize the chance of the GA becoming stuck at a degenerate partition, we use the following heuristic.

Since the clustering goal is to minimize the objective function, the above heuristic penalizes degenerate partitions by increasing their objective function value. This makes them less likely to be chosen for reproduction and less likely to survive to the next generation in any form. The above heuristic is applied in all experiments using the HCM objective function.

The following subsections describe our approach to selection, crossover, and mutation. In each case our choices are motivated by the desire to apply genetically guided clustering to image data sets, which are usually large. So, small populations and minimal generations are preferred.


  
Figure: The GGA algorithm
\begin{figure}
% latex2html id marker 243

\begin{list}
{GGA.\arabic{enumi}}{\us...
 ...he best (smallest) $R_{m}$\space value and report $R_{m}$.\end{list}\end{figure}



 
next up previous
Next: Selection Up: The Case for Genetic Previous: Clustering with HCM and
Larry Hall
5/26/1998