As a second example of integrating knowledge into clustering, we turn our attention to a satellite image of a coastline on the Arabian Sea. Note that our primary use of knowledge based clustering was MRI scans and that we have included this coastal image not as an exhaustive study, but to show that our technique is domain independent and applicable to a variety of fields.
Figure 5: CZCS Image, 443nm Band
The image area is 492 by 243 pixels from the Coastal Zone Color
Scanner (CZCS) [11] which provides 6 features
(visual and infrared bands) at each pixel: 443nm, 520nm,
550nm, 670nm, Near Infrared (NIR), and Infrared (IR). For each
band, each pixel has a value of 0 through 255 in each band, allowing them
to be treated as grey-scale images. Each pixel
in the image falls into one of three primary categories:
land, cloud, and ocean. Figure 5 and 6 show
the 443nm band image and the three-cluster result
respectively. As can be seen, pixels clearly belonging to ocean
were classified as either cloud or land. Applying the principles of
knowledge based clustering to the problem gives us a better solution and
goes further to label the image.
Acquisition and Use of Domain Knowledge: In searching for a reliable number of cluster centers for this 3 class problem, we chose to examine ten clusters for minimal under-segmentation. This number may seem high, but CZCS images, having six features instead of the three in MR images, are more complex. It is possible that fewer clusters could have been used, however, our focus in this domain was to show knowledge based clustering's applicability.
Our primary sources of knowledge were [11] and discussions
held with an expert in the field of marine science, a major user of
CZCS technology. We also acquired additional knowledge by
looking at the feature space of manually labeled images to
discover patterns and extract class characteristics.
The most useful knowledge gained from these sources
is that specific CZCS bands are used for specific purposes.
For example, the 443nm
band is used to differentiate land from brighter
cloud surfaces [11]. By thresholding each pixel's
feature values, we can seed the initial FCM matrix by adjusting specific
pixel memberships. In Figure 5, the very white pixels have
a very high probability of being cloud. So any pixel with
a 443nm value of 255 would have its membership set to a cloud
cluster. For the experiment, the following thresholds were used:
Pixels with a 443nm band value of 255 or an NIR value
have very strong cloud membership.
Pixels with IR band values < 90 have very strong land
membership.
Pixels with NIR band values < 30 and 443nm values
< 236 have very strong ocean membership.
These thresholds were achieved by manually sampling the feature images and extracting values from pixels for which we are confident of class membership. They are not necessarily optimal, but did provide satisfactory results. For the purpose of this experiment, ``very strong'' meant complete membership. We assigned the value of one to the class cluster which a given pixel was labeled, while all other classes were assigned zero. More extensive investigations would explore combinations where membership is split between candidate classes.
The important reason for knowing about the
function of each CZCS band is that clustering does not provide
information on which class any given cluster belongs to. Therefore,
this functional information is necessary to achieve pattern labeling.
We do this by projecting clusters into feature space along
specific axes in a manner similar to the one discussed in
Section 3.1.
Cloud occupies the 4 highest clusters in 443nm space.
Land occupies the 2 lowest clusters in IR space.
Ocean occupies the 3 lowest clusters in 550nm space.
This list includes the 3 clusters which were used in initial thresholding. Since we know which features were useful for detecting each class, these known clusters allowed us to quickly locate their respective neighbors (in feature space) since split classes are adjacent along specific parameters. One cluster remained unaccounted for, however. This was a cluster with significant under-segmentation, a candidate for reclustering.
Figure 7: Image After One Level of Reclustering.
The established rules were inserted into the expert system which could then extract the known classes and completely label them. This leaves the under-segmented cluster. Since we have classified and labeled the other clusters, the under-segmented cluster was easily isolated and reclustered into three classes, which were then examined and labeled. In this case, we were able to use simple thresholding along the NIR band to achieve labeling. The final image, seen in Figure 7 reveals that one level of reclustering did not completely label all pixels properly. However, we do observe much improved results and with further iterations of isolation and reclustering, the remaining misclassified pixels can be put in the proper classes. Furthermore, the final image showed us that we had erred in guessing the composition of the undersegmented cluster; what we had thought to be possible cloud was entirely land.