Meta-Reinforcement Learning in Time-Varying UAV Communications

L. Hu, Y. Shao, Y. Qian, F. Du, J. Li, Y. Lin, Z. Wang

Meta-Reinforcement Learning in Time-Varying UAV Communications

Číslo: issue 3/2024
Periodikum: Radioengineering Journal
DOI: 10.13164/re.2024.0417

Klíčová slova: Unmanned aerial vehicle (UAV) communication, anti-jamming, meta-reinforcement learning, mean field

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: Unmanned Aerial Vehicle (UAV) communication networks are vulnerable to malicious jamming and co-channel interference, deteriorating the performance of the networks. Therefore, the exploration of anti-jamming methods to enhance communication security becomes a significant challenge. In this paper, we propose a novel anti-jamming channel selection scheme in a multi-channel multi-UAV network. We first formulate the anti-jamming problem as a Partially Observable Stochastic Game (POSG), where the UAV pairs with partial observability compete for a limited number of communication channels against a Markov jammer. To ensure rapid adaptation to the dynamic jamming environment, we propose a Meta-Mean-Field Q-learning (MMFQ) algorithm, which provides a Nash Equilibrium (NE) solution to the POSG problem. Furthermore, we derive the expressions of the upper bound for the loss function of MMFQ and prove the convergence of the proposed algorithm. Simulation results demonstrate that the proposed algorithm can achieve a superior average reward compared to the benchmark algorithms, facilitating throughput enhancement and resource utilization increase, especially for large-scale UAV communication networks.