There are at least four factors that may affect the performance of the OMLET system: 1) the number of training epochs, 2) the number of training samples for each category, 3) the number of ranges to be learned for each category, and 4) the quality of the training data for each category. Histograms of the desired evaluation measures of the training data are used to convey the concept of training set ``quality". They are shown in Figure 11 for the GRUFF chair data. The height of each histogram bin is the number of training samples with desired evaluation measures that fall within a particular range. So, the histogram of a ``good" set of training data would be skewed towards the higher evaluation measures. Similarly, the histogram representing ``bad" training data would be skewed towards the lower evaluation measures.
Figure 11: Histograms of desired evaluation measures of the GRUFF chair training sets.
The histogram of a parent category, such as conventional chair or cup, represents the distribution of the overall desired evaluation measures (which are the goal measures of the examples in the data set provided as input to OMLET). However, the histograms for subcategories, such as straightback chair and armchair, represent the distributions of the desired evaluation measures associated with the additional functional requirements defined for the subcategory. For example, the histogram for the straightback chair category represents the quality of the provides_back_support portion of the straightback chair examples in a data set, not the overall desired evaluation measures. Recall that the ranges associated with the parent category conventional chair will be frozen (and presumably accurate) before learning begins for the category straightback chair. So, OMLET only uses straightback chair examples to learn the ranges associated with the provides_back_support functional property. Thus, when learning the ranges for the category straightback chair, we want to observe the quality of the back supports of the training examples. Similarly, we want to observe the quality of the arm supports of the armchair examples, not the overall desired evaluation measures.
Figure 12: Average training sample error versus number of training
epochs for A) GRUFF chair objects, B) synthetic cups, and C) real chair
objects. These plots are for a single leave-one-out test run.
Figure 12 shows examples of the average training sample error plotted as a function of the number of training epochs for each of the three data sets (GRUFF objects, synthetic cups, and real objects). From these plots, we can see that 1000 training epochs is more than sufficient for all of the categories in the three data sets. Training could most likely be stopped after 400 epochs for any of the categories without a degradation in system performance. Since the number of training epochs is the same for all categories, and has been shown to be sufficient, we can eliminate this factor as a possible cause for the different levels of performance among categories. Some experiments in addition to those described in Section 5 were run to examine the effect of the other performance factors.