Abstract
Multimedia event detection (MED) is one of the most important branches of multimedia content analysis. Current research work on MED focuses mainly on detecting specific events, such as sport events, news events and suspicious events, which is far from achieving a complicated and generic MED due to the fact that these events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event detectors and event classifiers. Hence, proper representation of these visual attributes could be helpful in building a sophisticated and generic MED. In this paper, we use Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and propose a ℓ2-regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. We also propose an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results demonstrate the effectiveness of the proposed LLGMM classifier for MED.