Abstract
Biometrics identification using multiple modalities has attracted the attention of many researchers as it producesmore robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodalrecognition system that trains a deep learning network to automatically learn features after extracting multiplebiometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., leftear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we trainsupervised denoising auto-encoders to automatically extract robust and non-redundant features. The automaticallylearned features are then used to train modality specific sparse classifiers to perform the multimodalrecognition. Moreover, the proposed technique has proven robust when some of the above modalities weremissing during the testing. The proposed system has three main components that are responsible for detection,which consists of modality specific detectors to automatically detect images of different modalities present infacial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capturediscriminative representations that are robust to the illumination and pose variations; and classification, whichconsists of a set of modality specific sparse representation classifiers for unimodal recognition, followed byscore level fusion of the recognition results of the available modalities. Experiments conducted on theconstrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resultedin a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracydemonstrates the superiority and robustness of the proposed approach irrespective of the illumination, nonplanarmovement, and pose variations present in the video clips even in the situation of missing modalities. KCI Citation Count: 0