Jason J. Corso
ACE - Active Clustering for Exploitation and Defense Forensics
People Jason Corso (PI), Caiming Xiong, David Johnson
Past Members: Albert Chen Funding: DARPA Computer Science Study Group (CSSG) (HR0011-09-1-0022 and N10AP20032). This project is kicking off in July 2010.
Objectives and Goals:We propose a revolutionary new approach to data analysis and modeling for computer vision called Active Clustering. Whereas traditional methods in machine learning typically require input from the user before commencing computation and have no subsequent interaction, our approach seeks dynamic input from the user during processing. In comparison to traditional supervised approaches which require extensive up-front effort from the user, in our case, the user will not be required to label large amounts of data. Rather, during processing we will ask simple questions of the user that let us adapt our underlying representation of the sample space. Furthermore, in many defense settings, large amounts of data for a particular target of interest (e.g., the "black Mercedes that is pictured here") may not be available anyway. In traditional unsupervised, or clustering, methods, the input from the user is in the form of basic assumptions about the sample space. There are two relevant problems with these methods. First, the assumptions typically require some degree of technical know-how on the part of the user. However, many DoD/IC end-user analysts would lack the necessary training to effectively map mission sets to clustering assumptions. Second, without the correct feature space, there is a disparity between the underlying distance function driving the clustering and the user's semantics in most realistic settings. In other words, the samples the clustering algorithm says are similar are in no way tied to the semantics of the user. Our proposed Active Clustering methodology overcomes both of these issues: simple intuitive questions about grouping are asked of the user thereby incorporating his or her semantics and requiring no technical knowledge of how the system works. These high-level questions are tied to the underlying mathematics rigorously. More recent methods that incorporate the user dynamically, such as Active Learning methods, seek a classifier over a predefined set of classes, which provides convenient mechanisms for selecting which samples to be labeled next by the user. The same convenience does not exist for the clustering (i.e., generative) case because the estimate of uncertainty or information gain is not as readily computed.
The main objective of this project is to develop the Active Clustering approach to video and image exploitation and forensics. The key questions to be answered in the new field of active clustering are (i) appropriate distance function formulation, (ii) clustering methodology, (iii) active user querying, and (iv) integration of user responses into learning. The inquiry will involve realistic data corpora and validation criteria.
Defense Relevance.Exploitation and forensics comprise the core defense relevance of our proposal with broad applications such as persistent surveillance and urban C2. The VIMEXF Problem is our focus: given a large corpus of video and image data, we want to allow the analyst (level 1, 2 or 3) to quickly search through the video and image data. Possible queries are to search for standard mission elements, or to select the set of clips containing a particular person or feature. Furthermore, the approach must scale well and adapt to new data on-line without full reindexing. We stress the emphasis is on perceptual and semantic content rather than existing meta content such as geospatial coordinates of the field of view.