Content-based image retrieval (CBIR) has become an active and challenging research area, and various approaches and systems have been developed. However, few work has been done on effectively evaluating the performance of the CBIR techniques and systems. The state of the art for evaluating image retrieval is quite chaotic: researchers design different algorithms and then test the performance on their own testbeds. The metrics such as precision and recall have been popularly used in the literature but are impractical due to the tedious process of measuring relevance and human subjectivity. Also, there is no common testbed, and there is no theory about how to compare different testbeds. The lack of an uniform evaluation methodology is clearly a limiting factor in the development of the multimedia retrieval field.

There are two main objectives in formulating an evaluation methodology for content-based image retrieval: (1) Measurement of the complexity of the image testbeds which can be used to quantitatively determine the degree of difficulty in retrieving images from the image testbeds, and (2) Comparison of the performance of different retrieval approaches which can quantitatively give an objective ranking of the performance of the retrieval approaches. In this project, we investigate a general evaluation methodology based on image database statistics and information retrieval theory for content-based image retrieval. In particular, we develop a novel approach which quantitatively measures the complexity of the image testbeds by calculating a single value -- cross entropy. To achieve this goal, we first establish a general framework (termed keyblock model) of the image feature representations which provides us a vehicle to conduct statistical analysis on images and forms a basis for establishing the evaluation methodology. Upon this framework, we establish our evaluation methodology which includes: (1) measuring the complexity of the image databases by their cross entropy with respect to a keyblock model, and (2) ranking the retrieval approaches by their cross entropy with respect to a particular testbed. Comprehensive experiments are conducted on various testbeds to evaluate and verify our approach.

This research is the first attempt to address technical issues on establishing a common benchmark for content-based image retrieval and has very important impact on the content-based image retrieval field. Using the proposed approaches, the image testbeds used to evaluate different approaches can be compared with each other on their complexity in supporting image querying. Furthermore, the retrieval techniques can be compared with each other without using queries so the human subjectivity is avoided. Also, based on the proposed measurements, issues regarding the arts, humanities, museum, as well as others can be pursued relatively easier. Thus, the algorithms and experimental results developed in this project are a valuable asset for the community to eventually establish a general theory of the evaluation methodology for content-based image retrieval research.