Abstract
In this chapter, we introduce a value decomposition-based reinforcement learning (RL) coupled with a meta training mechanism which allows RL agents to dynamically learn their policies while generalizing their learning to unseen environments. Then, we introduce the use of the designed RL algorithm to solve the problem of the trajectory design for energy-constrained drones operating in dynamic wireless network environments. In particular, within wireless networks, drones can provide radio accesses to ground users with their build in base station (BS) units. More importantly, a team of drone base stations (DBSs) can be dispatched to cooperatively serve clusters of distributed ground users that have dynamic and unpredictable uplink access demands. In this scenario, the DBSs must cooperatively navigate in the considered area to maximize coverage of the dynamic requests of the ground users. This trajectory design problem is posed as an optimization framework whose goal is to find optimal trajectories that maximize the fraction of users served by all DBSs. Then, we explain how to use the designed RL algorithm to solve this trajectory design problem.