北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2014, Vol. 37 ›› Issue (1): 80-84.doi: 10.13190/j.jbupt.2014.01.018

• 研究报告 • 上一篇    下一篇

聚类多Agent强化学习认知无线电资源分配

伍春1,2, 江虹2, 易克初1   

  1. 1. 西安电子科技大学 综合业务网理论及关键技术国家重点实验室, 西安 710071;
    2. 西南科技大学 国防科技学院, 四川 绵阳 621000
  • 收稿日期:2013-03-13 出版日期:2014-02-28 发布日期:2014-01-07
  • 作者简介:伍摇春(1978—),男,副教授,博士生,E-mail:soldier_wu@163.com;易克初(1943—),男,教授,博士生导师.
  • 基金资助:

    国家自然科学基金项目(61379005);国家重点基础研究发展计划项目(2009CB320403);国家科技重大专项基金项目(2009ZX03007-004);西安电子科技大学ISN 实验室开放课题(ISN10-09)

Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning

WU Chun1,2, JIANG Hong2, YI Ke-chu1   

  1. 1. State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an 710071, China;
    2. School of National Defense Technology, Southwest University of Science and Technology, Sichuan Mianyang 621000, China
  • Received:2013-03-13 Online:2014-02-28 Published:2014-01-07

摘要:

针对认知无线电多用户的信道和功率资源分配问题,提出一种基于用户聚类和可变学习速率的多Agent强化学习方法. 首先使用分层处理分离信道选择与功率控制,采用快速最优搜索结合用户数均衡调节实现信道分配;其次,使用随机博弈框架对多用户功率控制问题进行建模,通过K均值用户聚类减少博弈参与用户数量和降低单个用户的环境复杂度,并使用可变Q学习速率和策略学习速率的方法进一步促进多Agent强化学习的收敛. 仿真结果表明,该方法能使多个用户的功率状态和总收益有效收敛,并且使整体性能达到次优.

关键词: 认知无线电, 多Agent强化学习, 聚类, 功率控制, 可变学习速率

Abstract:

A multi-agent enforcement learning method based on user clustering as well as a variable learning rate was proposed for solving the problem of channel allocation and power control within multi cognitive radio users. Firstly, a hierarchy processing method was used to separate channel selection and power control. The channel allocation was implemented by fast optimal search combined with user-number balance. Secondly, stochastic game framework was adopted to model the multiuser power control issue. In subsequent multi-agent enforcement learning, K-means user clustering method was employed to reduce the user number in game and single user's environment complexity, and a variable learning rate scheme for Q learning and policy learning was proposed to promote the convergence of multiuser learning. Simulation shows that the method can make multiuser's power status and global reward converging effectively, moreover the whole performance can reach sub-optimal.

Key words: cognitive radio, multi-agent enforcement learning, clustering, power control, variable learning rate

中图分类号: