Abstract
In this paper, we investigate an accurate synchronization between a physical network and its digital network twin (DNT), which serves as a virtual representation of the physical network. The considered network includes a set of base stations (BSs) that must allocate its limited spectrum resources to serve a set of users while also transmitting its partially observed physical network information to a cloud server to generate the DNT. Since the DNT can predict the physical network status based on its historical status, the BSs may not need to send their physical network information at each time slot, allowing them to conserve spectrum resources to serve the users. However, if the DNT does not receive the physical network information of the BSs over a large time period, the DNT's accuracy in representing the physical network may degrade. To this end, each BS must decide when to send the physical network information to the cloud server to update the DNT, while also determining the spectrum resource allocation policy for both DNT synchronization and serving the users. We formulate this resource allocation task as an optimization problem, aiming to maximize the total data rate of all users while minimizing the asynchronization between the physical network and the DNT. The formulated problem is challenging to solve by traditional optimization methods, as each BS can only observe a partial physical network, making it difficult to find an optimal spectrum allocation strategy for the entire network. To address this problem, we propose a method based on the gated recurrent units (GRUs) and the value decomposition network (VDN). The GRU component allows the DNT to predict future status using the historical data, effectively updating itself when the BSs do not transmit the physical network information. The VDN algorithm enables each BS to learn the relationship between its local observation and the team reward of all BSs, allowing it to collaborate with others in determining whether to transmit physical network information and optimizing spectrum allocation. Simulation results show that our GRU and VDN based algorithm improves the weighted sum of data rates and the similarity between the status of the DNT and the physical network by up to 28.96%, compared to a baseline method combining GRU with the independent Q learning (IQL).