Multi-agent RL and Scalability Challenges for Random Access in MTC
Machine-type communication (MTC) paradigm in wireless communication will help connect millions of devices to perform different tasks without human involvement. The use cases of MTC are numerous such a factory automation, fleet management, smart homes and smart metering, e-health and smart logistics etc. Mostly, the devices called machine-type devices (MTDs) in the MTC network are power-constrained low-complexity devices that are battery-operated. Moreover, as opposed to the devices in human-type communication (HTC), these devices will have packets of short-lengths and random sleep-cycles. Due to these peculiarities — especially in terms of traffic characteristics — MTC systems demand for novel multiple access techniques as opposed to the schemes in HTC: grant-free access and grant-based access.
It has been established in the literature that the grant-based or scheduling techniques are not feasible for the traffic that has short packet lengths and when devices are battery-constrained. This is due to the fact that grant-based schemes such as TDMA, FDMA require exchange of signalling which consumes battery and such massive signalling to send a packet that may have payload data smaller than control data used for signalling. A preferred option is uncoordinated grant-based random access (RA) schemes such as slotted ALOHA for MTC. In uncoordinated RA, each users selects a physical resource randomly and selects its data to the common receiver. Such scheme requires minimal signalling and it is simple to implement; however, as we know, RA schemes suffer from high number of collisions when traffic intensity is high. For these reasons, there is a need to design and develop novel access schemes for MTC systems or better collision resolution mechanisms based on the new traffic models that are suited well for MTC.
In the recent years, a significant number of research works have been performed using machine learning and specifically reinforcement learning (RL) for resource allocation in wireless networks. These works employ state-of-the-art RL techniques to design resource allocation policies and for RA, since each user in the network takes the decision independently without knowing the actions and state of the other users, the problem is usually modelled using multi-agent RL (MARL)and partially observable environment.
Roughly speaking, MARL techniques can be divided into centralized training decentralized execution (CTDE) and completely decentralized training and execution. For MTC network, the use of MARL for RA becomes even more challenging due to the following requirements.
MARL-based Access Scheme Requirements for MTC
- Since the devices in MTC network are battery-constrained and have very low computational complexity and they have variable sleep cycles, the devices are required to have a same channel access policy, which can be deployed in a distributed way.
- For the same reasons, training on such devices is not feasible and therefore we need the CTDE mechanism to learn a policy for the devices.
- The scheme must be scalable to a large number of devices.
- The scheme should have a low signalling overhead and therefore, communication between users is not feasible.
Parameter Sharing — a solution?
Parameter sharing is perhaps the simplest yet very useful form of MARL with CTDE, in which the main idea is to extend the single agent network to the multiagent system . All the agents use the same network or the same function approximator to calculate the value of each state and the agents are homogeneous: same action-space, rewards and state-space, and therefore the same policy. This means that agents with the same state will have the same values against that state. To distinguish between the states, agent ID is usually encoded in the state so that every agent will have a unique state. One of the advantages of the parameter sharing is that it can be scaled better than the other multiagent techniques. However, the extent of the scalability is not known yet and perhaps it is dependent on the problem and model of the environment. Anyhow, for MTC, where, for the same type of devices, it is reasonable to use parameter sharing for scalability since the same type of devices will have same tasks to be performed; however there’s an issue!
The issue: The devices in MTC network can join and leave the network randomly and even new devices can join the network. Therefore, incorporating agent ID in the state space doesn’t really make sense and it can only complicate things.
We have presented couple of our works on this area where we didn’t use any agent ID in the state and using different reward function in both of our works  and . We have shown that parameter sharing behaves reasonable but perhaps for the traffic modelled considered in both of these works, there is nothing else to gain. We used a single resource and multiple number of users to design transmit probability of users. In , we presented our scalability analysis. We used DQN and we plan to use actor critic methods to see if we can learn a better probability distribution over action space. The problem with using actor-critic MARL with centralized critic and decentralized actors is that the dimensions of the centralized critic increase as the number of agents increase. Therefore, we can either compress this information which is already limited or we can just use local actor and local critic for each agent, which we believe will not provide any performance gain over DQN.
The issue of scalablility for MARL algorithms for MTC is challenging to design an access policy. For RL perspective, if on one hand, parameter sharing allows us to scale for the same type of agents but on the other hand, we need to design an unsourced grant-free RA policy in which we don’t need to know the identity of the agents. Furthermore, the traffic characteristics of MTC network needs to be considered in designing access policy. The traffic generation of devices in MTC network is independent as well as correlated. For instance, in case of an event such as fault detection in a machine or fire or quake, there is a high probability of devices becoming active together around the epicentre of the event together.
In this post, we have highlighted some of the high-level challenges in terms of of designing RA protocols for MTC networks using RL. Most of the works do not really talk about scalability and most of the works do not consider signalling overhead factor when designing MARL system. Such techniques cannot be directly applied in MTC networks and therefore the above mentioned factors should be considered to design RA policies for MTC.
 C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: physical and MAC-layer solutions,” IEEE Communications Magazine, vol. 54, no. 9, pp. 59–65, 20
 J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Autonomous Agents and Multiagent Systems (G. Sukthankar and J. A. Rodriguez-Aguilar, eds.), (Cham), pp. 66–83, Springer International Publishing, 2017.
. M. A Jadoon, A. Pastore, M. Navarro, F. Perez-Cruz, “Deep Reinforcement Learning for Random Access in Machine-Type Communication”, in IEEE WCNC April, 2022.
. M. A Jadoon, A. Pastore, M. Navarro, “Collision Resolution with Deep Reinforcement Learning for Random Access in Machine-Type Communication”, in VTC Spring June, 2022.