Information and Code Download
Jason J. Corso
Information and Code Download
Action Bank Mailing List
Back to Research Pages
Action Bank™: A High-Level Representation of Activity in Video
Human motion and activity is extremely complex. Most promising recent approaches are based on low- and mid-level features (e.g., local space-time features, dense point trajectories, and dense 3D gradient histograms). In contrast, the Action Bank™ method is a new high-level representation of activity in video. In short, it embeds a video into an "action space" spanned by various action detector responses, such as walking-to-the-left, drumming-quickly, etc. The individual action detectors in our implementation of Action Bank™ are template based detectors using the action spotting work of Derpanis et al. CVPR 2010. Each individual action detector correlation video volume is transformed into a response vector by volumetric max-pooling (3-levels for a 73-dimension vector); in our library and methods there are 205 action detector templates in the bank, sampled broadly in semantic and viewpoint space. Our paper shows how a simple classifier like an SVM can use this high dimensional representation to effectively recognition realistic videos of complex human activities. On this page, you will find downloads for our source code, already processed versions of major vision data sets, and a description about the method and the code in some more detail.
News / Updates
Code / Download:
Action Bank™ Versions of Data Sets
Benchmark ResultsWe have tested action bank on a variety of activity recognition data sets. See the paper for full details. Here, we include a sampling of the results. UCF Sports
FAQ / HelpWe try to provide some answers to frequent questions and help below in running the code and/or using the outputted banked vectors.
Question 1: I am running the software on a video and it hangs; what's going on?
The most likely answer to this question is not that the system is hanging but that the system is processing through the method, which is relatively computationally expensive (especially in this pure python form). Here, I run through an example to give you an idea of what you should see... I am processing through the first video in the UCF50 BaseballPitch class (named: v_BaseballPitch_g01_c01.avi). This video is 320x240 and has 107 frames; it is not a big video. I copied and renamed it to /tmp/input.avi
python actionbank.py -s -c 2 -g 2 /tmp/input.avi /tmp/output
The -s means this is a single video and not a directory of videos. The -c 2 means use 2 cores for processing. The -g 2 means reduce the video by a factor of two before applying the bank detectors (but after featurizing).
Question 2: I get this runtime error when I run the code:
actionbank/code/spotting.py:563: RuntimeWarning: invalid value encountered in divide Z = V / (V.sum(axis=3))[:,:,:,np.newaxis]
This case means that there is no motion energy at all for a pixel in the video, which is quite possible for typical videos. We explicitly handle it in the subsequent lines of spotting.py by checking for NAN and INF. I.e., disregard the runtime warning.
Question 3: The classify function call in ab_svm.py gives an AttributeError. For example, when I run ab_kth_svm.py, I get the following error:
Traceback (most recent call last): File "ab_kth_svm.py", line 99, in
This seems to be a change in the Shogun library interface. Our work was performed with shogun version libshogun (x86_64/v0.9.3_r4889_2010-05-27_20:52_4889). In newer versions of shogun, classify is replaced with apply. Note, we have not yet tested this in house and results may vary. We also want to point out that the ab_svm.py module is included as an example of how to use the action bank output for classification. One can use other, preferred, classifiers or platforms, such as Random Forests or Matlab, respectively.