Abstract
In this paper, a multi-modal vehicle positioning framework that jointly localizes vehicles with channel state information (CSI) and images is designed. In particular, we consider an outdoor scenario where each vehicle can communicate with only one base station (BS), and hence, it can upload its estimated CSI to only its associated BS. Each BS is equipped with a set of cameras, such that it can collect a small number of labeled CSI, a large number of unlabeled CSI, and the images taken by cameras. To exploit the unlabeled CSI data and position labels obtained from images, we design a hard expectation-maximization (EM) based deep learning (DL) algorithm. Specifically, since we do not know the corresponding relationship between unlabeled CSI and the multiple vehicle locations in images, we formulate the calculation of the log-likelihood function as a maximum matching problem. Subsequently, the model parameters are updated according to the maximum matching between unlabeled CSI and position labels obtained from images. Simulation results show that the proposed method can reduce the positioning error by up to 60% compared to a baseline that does not use images and uses only CSI fingerprint for vehicle positioning.