In object recognition it is important to test a system on real objects, if possible, for a number of reasons. First, we can see whether the system can approximate human judgment. Second, it is important to observe system performance in the presence of noise, which real-world data will inevitably contain. Finally, using real-world data will alleviate the need to completely hand-craft the system with synthetic data. This is actually a useful guide for the scenario where the ``vision system engineer" gives the system a set of human-labeled examples, and lets the system learn the parameters. To test OMLET, we have used a set of 37 actual objects and human ratings of how well they might serve as a chair. Figure 10 shows some of the objects used in these experiments.
Figure 10: Some examples of the chair objects used for human evaluation tests.
In order to determine how well OMLET can learn to recognize the set of real chair-like objects, all the objects were collected together in a single room and each object was placed in the orientation in which it would most likely be recognized as a chair. For actual chairs, this is simply the orientation in which the chair would typically be used. For a metal trash can it would be an ``upside down" orientation, etc. Then a group of 32 undergraduate students in an Artificial Intelligence class was given the following instructions:
You are asked to rate each of the thirty-seven objects according to the degree of ``chair-ness" that is reflected in its 3-D shape. For our purposes, ``chair-ness" measures if the object could be used as a chair. You are to consider only the 3-D shape in making your rating. You should assume that each object is made of appropriate materials, so that this is not a factor in your ratings. You are to consider the suitability of the object shape only in the orientation that you see it, rather than some other orientation. Examples of factors that you should consider in rating the ``chair-ness" of a shape are height, width, depth, area, relative orientation and apparent stability.You are asked to rate each shape against the requirements of three different aspects of ``chair-ness". The first aspect is solely its ability to provide a stable seating surface. The second aspect is solely its ability to provide back support compatible with the seating surface. The third aspect is solely its ability to provide arm support compatible with the seat and back. Each aspect should be judged independently on a scale of 1 to 5, where 1 means it has no ability to provide the required function and 5 means that it seems ideal to provide the desired function. You may mark halfway between two numbers if you wish.
The ratings of each aspect of ``chair-ness" were then averaged, normalized and rounded to the nearest multiple of 0.02 to result in values in the range [0,1]. The overall evaluation measures for the objects for the conventional chair category are taken as the normalized evaluation measures for the first aspect of ``chair-ness", that is the object's ability to provide a stable seating surface. Overall evaluation measures for the categories straightback chair and armchair are computed using the probabilistic or T-conorm to combine the three aspects of ``chair-ness" in the manner described in Subsection 3.3. Hence, a comfortable, sturdy chair would have a value close to 1 for ``chair-ness", while the upside-down trash can has a considerably lower value (approx. 0.5).
After the objects had been rated, measurements were taken for each of the primitives describing the chair in the GRUFF system. The measurements were those required for the OMLET rules, such as the clearance from the ground, the area of the sittable surface, the height of the sittable surface, etc. Complete OMLET examples describing the objects were then created, including the aggregate evaluation measure of the objects for the categories conventional chair, straightback chair, and armchair. This resulted in 37 objects for the conventional chair category, 22 objects in the straightback chair category (15 objects had no back support at all), and 12 objects in the armchair category (10 objects that had back support did not have any arm support). There are at least two sources of noise in this experimental data: 1) the human evaluations, and 2) the actual measurements of the physical properties of the objects. For example, the standard deviations of the normalized human evaluations of the 37 objects for the conventional chair category are about 0.12, or 12%, on average. The results of leave-one-out testing on the 37 real-world objects are presented in the next section.