S. S. Intille and A. F. Bobick, "Recognizing planned, multi-person action," Computer Vision and Image Understanding (1077-3142), vol. 81, pp. 414-445, 2001.
[Compressed Postscript] [PDF]


Multi-person action recognition requires models of structured interaction between people and objects in the world. This paper demonstrates how highly structured, multi-person action can be recognized from noisy perceptual data using visually grounded goal-based primitives and low-order temporal relationships that are integrated in a probabilistic framework 

The representation, which is motivated by work in model-based object recognition and probabilistic plan recognition, makes four principal assumptions: (1) the goals of individual agents are natural atomic representational units for specifying the temporal relationships between agents engaged in group activities, (2) a high-level description of temporal structure of the action using a small set of low-order temporal and logical constraints is adequate for representing the relationships between the agent goals for highly structured, multi-agent action recognition, (3) Bayesian networks provide a suitable mechanism for integrating multiple sources of uncertain visual perceptual feature evidence, and (4) an automatically generated Bayesian network can be used to combine uncertain temporal information and compute the likelihood that a set of object trajectory data is a particular multi-agent action.

The recognition method is tested using a database of American football play descriptions and manually-acquired but noisy player trajectories. The strengths and limitations of the system are discussed and compared with other multi-agent recognition algorithms.


Multi-person action recognition, plan recognition, computer vision, Bayesian networks.