Abstract
The classification of attention states utilizing both electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) is pivotal in understanding human cognitive functions. While multimodal algorithms have been explored within brain-computer interface (BCI) research, the integration of modal features often falls short of efficacy. Moreover, comprehensive multimodal classification studies employing deep learning techniques for attention state classification are limited. This paper proposes a novel EEG-fNIRS multimodal deep fusion framework (EFDFNet), which employs fNIRS features to enhance EEG feature disentanglement and uses a deep fusion strategy for effective multimodal feature integration. Additionally, we have developed EMCNet, an attention state classification network for the EEG modality, which combines Mamba and Transformer to optimize the extraction of EEG features. We evaluated our method on two attention state classification datasets and one motor imagery dataset, i.e., mental arithmetic (MA), word generation (WG) and motor imagery (MI). The results show that EMCNet achieved classification accuracies of 86.11%, 79.47% and 75.77% on the MA, WG and MI datasets using only the EEG modality. With multimodal fusion, EFDFNet improved these results to 87.31%, 80.90% and 85.61%, respectively, highlighting the benefits of multimodal fusion. Both EMCNet and EFDFNet deliver state-of-the-art performance and are expected to set new baselines for EEG-fNIRS multimodal fusion.