北京邮电大学学报

  • EI核心期刊

北京邮电大学学报

• 论文 •    下一篇

通信资源调度对称MARL问题策略估计误差分析

张昕然, 孙松林   

  1. 1. 北京邮电大学 信息与通信工程学院, 北京 100876;
    2. 北京邮电大学 可信分布式计算与服务教育部重点实验室, 北京 100876;
    3. 北京邮电大学 移动互联网安全技术国家工程实验室, 北京 100876
  • 收稿日期:2018-06-20 出版日期:2019-04-28 发布日期:2019-04-28
  • 通讯作者: 孙松林(1974-),男,教授,博士生导师,E-mail:slsun@bupt.edu.cn. E-mail:slsun@bupt.edu.cn
  • 作者简介:张昕然(1987-),男,博士生.
  • 基金资助:
    国家自然科学基金项目(61471066)

Policy Estimation Error Analysis for Symmetrical MARL Problem in Communication Resource Scheduling

ZHANG Xin-ran, SUN Song-lin   

  1. 1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Key Laboratory of Trustworthy Distributed Computing and Service(Ministry of Education), Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. National Engineering Laboratory for Mobile Network Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2018-06-20 Online:2019-04-28 Published:2019-04-28

摘要: 针对通信资源调度场景下的多智能体强化学习(MARL)问题,提出了对称MARL问题以及三类对称性的定义和条件,并定义了策略融合和策略误差;针对强对称MARL问题,定义了三类评价指标,并对策略估计误差进行分析,提出了强对称MARL问题的策略误差定理及推论.针对无线通信的接入控制问题建立了MARL问题,仿真结果验证了强对称MARL问题策略估计误差的特性.结果表明,可以使用低复杂度的MARL子问题对高复杂度的强对称MARL问题进行策略估计,且策略估计误差和对网络性能的影响均较小.

关键词: 强化学习, 对称多智能体强化学习, 策略估计

Abstract: Considering multi-agent reinforcement learning (MARL) theory in communication resource scheduling scenario, the symmetrical MARL problem was proposed with definitions for three types of symmetry properties and analysis of policy estimation error. The policy estimation error theorem for strong symmetrical MARL was presented. Simulation results based on the admission control problem in wireless system were modeled by MARL, which testify the characteristics of policy estimation error for strong symmetrical MARL problems. It shows that using the MARL sub-problems with low computational complexity to estimate the original MARL problem with high computational complexity only brings small policy estimation error and deterioration of system performance.

Key words: reinforcement learning, symmetrical multi-agent reinforcement learning, policy estimation

中图分类号: