Abstract
In this paper, we address the problem of the operator's economic profit maximization in a multi-access edge computing (MEC)-enabled time division multiple access (TDMA)-based air-ground integrated networking (AGIN) network. We consider to optimize task placement and replacement, unmanned aerial vehicle (UAV) placement, UAV flight time, access control, and task offloading ratios in user devices (UDs) and the UAV. The optimization is constrained by storage capacity, task processing quality of service (QoS) requirements, and TDMA requirements, etc. Our optimization is conducted in two time scales. Task placement and replacement are performed in a coarse-grained time scale (frame), while other optimizations are conducted in a fine-grained time scale (time slot). Due to the high dynamics of the environment, finding a solution is challenging. To address this problem, we present a hierarchical deep reinforcement learning (DRL) algorithm. The high-level component is a deep Q network (DQN) agent responsible for obtaining task placement and replacement solutions within a frame. The low-level component is an improved deep deterministic policy gradient (IDDPG) agent, which is used to address task processing-related issues within a time slot. Our simulations illustrate that the proposed algorithm has good performance in economic profit maximization compared with other algorithms.