Abstract
In this paper, the problem of the trajectory design for an intention-driven drone operating in dynamic wireless network environments is studied. In the considered model, the drone acts as a supplementary base station that navigates among ground user clusters to provide on-demand uplink data access. Given its intended application (e.g traffic monitoring), the drone base station (DBS) prioritizes serving certain clusters (e.g. high-risk highway sections). A digital twin (DT) system, hosted on a central server, creates a virtual representation of the wireless network environment to simulate and predict related changes, at which case the DBS trajectory should also be adjusted. Then, the DT system suggests adjustment of DBS trajectories without guaranteed access to the DBS service priorities, as the DBS may not cooperate with wireless infrastructures for same data collection applications. Such adjustment is posed as an optimization problem whose goal is to find the trajectories with which the fraction of prioritized users served by the DBS is maximized. To solve this problem under unknown DBS intention and unpredictable environment changes, an inverse reinforcement learning (IRL) based DT actuation solution is proposed. Simulation results demonstrate a less than 1% performance gap from the optimal trajectory, without requiring access to confidential service priorities, while exhibiting a 7% improvement compared to the baseline IRL algorithm.