一种基于多智能体强化学习的流量分配算法

doi:10.13190/j.jbupt.2019-140

北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (6): 43-48,57.doi: 10.13190/j.jbupt.2019-140

一种基于多智能体强化学习的流量分配算法

程超¹, 滕俊杰², 赵艳领³, 宋梅¹

1. 北京邮电大学电子工程学院, 北京 100876;
2. 中国金融认证中心, 北京 100054;
3. 机械工业仪器仪表综合技术经济研究所, 北京 100055

收稿日期:2019-07-10 出版日期:2019-12-28 发布日期:2019-11-15
通讯作者: 宋梅(1960-),女,教授,博士生导师,E-mail:songm@bupt.edu.cn. E-mail:songm@bupt.edu.cn
作者简介:程超(1993-),男,硕士生.
基金资助:
国家重点研发计划项目（2018YFB1201500）；国家自然科学基金项目（61871046）；北京市自然科学基金项目（L171011）；北京市重大专项项目（Z181100003118012）

Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning

CHENG Chao¹, TENG Jun-jie², ZHAO Yan-ling³, SONG Mei¹

1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. China Financial Certification Authority, Beijing 100054, China;
3. Instrumentation Technology and Economy Institute, Beijing 100055, China

Received:2019-07-10 Online:2019-12-28 Published:2019-11-15
Supported by:

摘要/Abstract

摘要： 传统的流量工程策略的研究大多集中在构建和求解数学模型方面，其计算复杂度过高，为此，提出了一种经验驱动的基于多智能体强化学习的流量分配算法.该算法无需求解复杂数学模型即可在预计算的路径上进行有效的流量分配，从而高效且充分地利用网络资源.算法在软件定义网络控制器上进行集中训练，且在训练完成后再接入交换机或者路由器上分布式执行，同时也避免和控制器的频繁交互.实验结果表明，相对于最短路径和等价多路径算法，新算法有效减少了网络的端到端时延，并且增大了网络吞吐量.

关键词: 流量工程, 多智能体强化学习, 软件定义网络, 时延, 吞吐量

Abstract: Most of the researches on traditional traffic engineering strategies focus on constructing and solving mathematical models. To reduce computational complexity,an experience-driven traffic allocation algorithm based on multi-agent reinforcement learning was proposed. It can effectively distribute traffic on pre-calculated paths without solving complex mathematical models and then fully utilize network resources. The algorithm performs centralized training on the software defined networking controller,and can be executed on the access switch or router in a distributed way after the training is completed. Frequent interactions with the controller are avoided at the same time. Experiments show that the algorithm is effective in reducing the end-to-end delay and increasing throughput of the network with respect to the shortest-path and the equal-cost multi-path.

Key words: traffic engineering, multi-agent reinforcement learning, software-defined networking, delay, throughput

中图分类号:

TP393.0

程超, 滕俊杰, 赵艳领, 宋梅. 一种基于多智能体强化学习的流量分配算法[J]. 北京邮电大学学报, 2019, 42(6): 43-48,57.

CHENG Chao, TENG Jun-jie, ZHAO Yan-ling, SONG Mei. Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(6): 43-48,57.

参考文献

[1] Agarwal S, Kodialam M, Lakshman T V. Traffic engineering in software defined networks[C]//2013 Proceedings IEEE INFOCOM. Turin, Italy:IEEE Press, 2013:2211-2219.
[2] Silver D, Huang A, Maddison C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
[3] Mestres A, Hibbett M J, Estrada G, et al. Knowledge-defined networking[J]. ACM SIGCOMM Computer Communication Review, 2017, 47(3):2-10.
[4] Chavula J, Densmore M, Suleman H. Using SDN and reinforcement learning for traffic engineering in UbuntuNet Alliance[C]//2016 International Conference on Advances in Computing and Communication Engineering (ICACCE). Durban, South Africa:IEEE Press, 2016:349-355.
[5] Xu Zhiyuan, Tang Jian, Meng Jingsong, et al. Experience-driven networking:a deep reinforcement learning based approach[C]//IEEE INFOCOM 2018-IEEE Conference on Computer Communications. Honolulu, USA:IEEE Press, 2018:1871-1879.
[6] Sutton R S, Barto A G. Reinforcement learning:an introduction[M]. Cambrige:MIT Press, 1998.
[7] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[EB/OL]. 2013(2013-12-19)[2019-07-05]. https://arxiv.org/pdf/1312.5602.pdf.
[8] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C]//Advances in Neural Information Processing Systems. Long Beach, USA:Curran Associates, 2017:486-496.
[9] Lowe R, Wu Y, Tamar A, et al. Multi-agent-actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems. Long Beach, USA:Curran Associates, 2017:6379-6390.
[10] Winstein K, Balakrishnan H. Tcp ex machina:Computer-generated congestion control[C]//ACM SIGCOMM 2013. Hong Kong, China:ACM Press, 2013:123-134.
[11] 张峰, 李刚, 宋丽. 一种适应网络拥塞的网络端到端时延估算模型[J]. 空军雷达学院学报, 2009, 23(3):190-193. Zhang Feng, Li Gang, Song Li. An estimation model of end-to-end delay of network congestion[J]. Journal of Air Force Radar Academy, 2009, 23(3):190-193.

[1]	杨华, 耿烜, 孔宁. 一种采用dueling-DDQN算法的无线网络MAC协议[J]. 北京邮电大学学报, 2023, 46(3): 25-30.
[2]	彭维平杨玉莹宋成阎俊豪. VEC中多边缘节点协作卸载与资源分配算法[J]. 北京邮电大学学报, 2023, 46(2): 78-83.
[3]	韩书君吕素玉许晓东陶小峰韩伯骁. 智能反射面辅助的短包通信系统：时延与安全性能分析[J]. 北京邮电大学学报, 2022, 45(6): 70-77.
[4]	李竟博, 马礼, 马东超, 傅颖勋, 李阳. 基于SDN的一体化融合网络路由调度机制[J]. 北京邮电大学学报, 2022, 45(4): 84-90.
[5]	李竟博马礼马东超傅颖勋李阳. 基于SDN的一体化融合网络路由调度机制[J]. 北京邮电大学学报, 2022, 45(4): 98-104.
[6]	杜梅, 周军华, 李敦桥, 陈士钊, 魏翼飞. MEC计算卸载与资源分配联合智能优化方案[J]. 北京邮电大学学报, 2022, 45(2): 65-71.
[7]	贾雨宁, 魏翼飞, 周军华. 基于SDN与NFV的服务功能链编排算法[J]. 北京邮电大学学报, 2022, 45(2): 85-90.
[8]	高月红, 杨昊天, 陈露, 杨鸿文, 尹宁. 基于模糊逻辑的eMBB/URLLC复用机制选择算法[J]. 北京邮电大学学报, 2021, 44(3): 15-20,34.
[9]	任首首, 刘冰洋, 王闯, 孟锐, 刘轩. 基于网络演算理论的Damper调度机制[J]. 北京邮电大学学报, 2021, 44(2): 26-32.
[10]	王敬超, 高先明, 黄玉栋, 汪硕, 黄韬. 时间敏感网络的控制架构[J]. 北京邮电大学学报, 2021, 44(2): 95-101.
[11]	杨鹏, 张义富, 李职杜, 吴大鹏, 王汝言. 面向可靠端边协同的时延保障模型[J]. 北京邮电大学学报, 2021, 44(2): 47-53.
[12]	康曼聪, 李曦, 纪红, 张鹤立. 基于任务间依赖关系的小小区协作卸载策略[J]. 北京邮电大学学报, 2021, 44(1): 72-78.
[13]	马璐, 刘铭, 李超, 路兆铭, 马欢. 面向6G边缘网络的云边协同计算任务调度算法[J]. 北京邮电大学学报, 2020, 43(6): 66-73.
[14]	王明伟, 李慧贞. Nakagami-m信道衰落下的多时隙能量收集无线通信[J]. 北京邮电大学学报, 2020, 43(4): 54-60.
[15]	许蒙蒙, 朱海, 崔娅杰, 徐恒舟. 稀疏移动网络中时延软约束的低能耗路由算法[J]. 北京邮电大学学报, 2020, 43(3): 72-76.

一种基于多智能体强化学习的流量分配算法

Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价