基于Sumtree DDPG的智能交通信号控制算法

doi:10.13190/j.jbupt.2020-006

北京邮电大学学报 ›› 2021, Vol. 44 ›› Issue (1): 97-103.doi: 10.13190/j.jbupt.2020-006

基于Sumtree DDPG的智能交通信号控制算法

黄浩^1,3,4, 胡智群², 王鲁晗^1,3,4, 路兆铭^1,3,4, 温向明^1,3,4

1. 北京邮电大学信息与通信工程学院, 北京 100876;
2. 湖北大学计算机与信息工程学院, 武汉 430062;
3. 北京邮电大学网络体系构建与融合北京市重点实验室, 北京 100876;
4. 北京邮电大学先进信息网络北京实验室, 北京 100876

收稿日期:2020-01-17 出版日期:2021-02-28 发布日期:2021-09-30
通讯作者: 胡智群(1989-),女,副教授,硕士生导师,E-mail:zhiqunhu520@163.com. E-mail:zhiqunhu520@163.com
作者简介:黄浩(1997-),男,博士生.
基金资助:
国家自然科学基金项目（61901163）；北京市科技新星计划项目（Z191100001119028）

Intelligent Traffic Signal Control Algorithm Based on Sumtree DDPG

HUANG Hao^1,3,4, HU Zhi-qun², WANG Lu-han^1,3,4, LU Zhao-ming^1,3,4, WEN Xiang-ming^1,3,4

1. School of Information and Communications Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. School of Computer and Information Engineering, Hubei University, Wuhan 430062, China;
3. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China;
4. Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2020-01-17 Online:2021-02-28 Published:2021-09-30

摘要/Abstract

摘要： 提出了基于和树—深度确定性策略梯度（Sumtree DDPG）的多路口智能交通信号控制算法，通过对交叉路口数据的实时观测，智能地调控交通信号周期时长、相位顺序以及相位持续时间，提高路口通行效率.同时，基于和树结构的经验数据存储模式提高采样效率，加速了算法收敛.仿真结果表明，在动态环境下，该算法在车辆排队长度、车辆等待时间、车辆平均速度等性能指标上均优于现有的固定配时方案和基于流量权重的配时算法.

关键词: 智能交通, 交通信号控制, 深度强化学习, 深度确定性策略梯度, 多路口

Abstract: A multi-intersection intelligent traffic signal control algorithm based on sumtree deep deterministic policy gradient(Sumtree DDPG)is proposed. Through real-time observation of intersection data,the cycle length,phase sequence and phase duration of the traffic signal can be intelligently adjusted to improve the efficiency of intersections. Meanwhile,the empirical data storage mode based on sumtree structure can improve the sampling efficiency and accelerate the algorithm convergence. Compared with fixed signal timing and signal timing algorithm based on traffic flow weight,a simulation is carried out that the proposed algorithm obtains good performance in vehicle queue length,vehicle waiting time and vehicle average speed in dynamic environment.

Key words: smart transportation, traffic signal control, deep reinforcement learning, deep deterministic policy gradient, multiple intersections

中图分类号:

U491.54

黄浩, 胡智群, 王鲁晗, 路兆铭, 温向明. 基于Sumtree DDPG的智能交通信号控制算法[J]. 北京邮电大学学报, 2021, 44(1): 97-103.

HUANG Hao, HU Zhi-qun, WANG Lu-han, LU Zhao-ming, WEN Xiang-ming. Intelligent Traffic Signal Control Algorithm Based on Sumtree DDPG[J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(1): 97-103.

参考文献

[1] Araghi S, Khosravi A, Creighton D. A review on computational intelligence methods for controlling traffic signal timing[J]. Expert Systems with Applications, 2015, 42(3):1538-1550.
[2] Sims A G, Finlay A B. SCATS, splits and offsets simplified (SOS)[J]. Australian Road Research, 1984, 12(4):17-33.
[3] Lo H K. A reliability framework for traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2006, 7(2):250-260.
[4] Abdulhai B, Pringle R, Karakoulas G J. Reinforcement learning for true adaptive traffic signal control[J]. Journal of Transportation Engineering, 2003, 129(3):278-285.
[5] Genders W, Razavi S. Using a deep reinforcement learning agent for traffic signal control[EB/OL]. (2016-11-03)[2020-01-10]. https://arXiv.org/abs/1611.01142.
[6] Richter S, Aberdeen D, Yu J. Natural actor-critic for road traffic optimisation[C]//NIPS 2006, Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver:MIT Press, 2006:1169-1176.
[7] Pang Hali, Gao Weilong. Deep deterministic policy gradient for traffic signal control of single intersection[C]//2019 Chinese Control and Decision Conference (CCDC). Nanchang:IEEE, 2019:5861-5866.
[8] Casas N. Deep deterministic policy gradient for urban traffic light control[EB/OL]. (2017-08-02)[2020-01-10]. https://arXiv.org/abs/1703.09035v1.
[9] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C]//ICLR 2016:International Conference on Learning Representations. San Juan:[s.n.], 2016:1-14.
[10] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[C]//ICLR 2016:International Conference on Learning Representations. San Juan:[s.n.], 2016:1-21.
[11] Wang Xiaoqiang, Ke Liangjun, Qiao Zhimin, et al. Large-scale traffic signal control using a novel multi-agent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2021, 51(1):174-187.
[12] Yang Shantian, Yang Bo, Wong Hau-San, et al. Cooperative traffic signal control using multi-step return and off-policy asynchronous advantage actor-critic graph algorithm[J]. Knowledge-Based Systems, 2019, 183:1-19.
[13] Wei Hua, Zheng Guanjie, Yao Huaxiu, et al. Intellilight:a reinforcement learning approach for intelligent traffic light control[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM, 2018:2496-2505.
[14] Zhang Zhi, Yang Jiachen, Zha Hongyuan. Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization[C]//AAMAS 2020:Proceedings of the Nineteenth International Conference on Autonomous Agents and Multi-Agent Systems. Auckland:Springer, 2020:2083-2085.

基于Sumtree DDPG的智能交通信号控制算法

Intelligent Traffic Signal Control Algorithm Based on Sumtree DDPG

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价

[1]	刘阳, 滕颖蕾, 牛涛, 郅佳琳. 基于深度强化学习的滤波器剪枝方案[J]. 北京邮电大学学报, 2023, 46(3): 31-36.
[2]	谭炜骞, 吴斌伟, 汪硕. 确定性网络跨域传输架构与DRL流量调度算法[J]. 北京邮电大学学报, 2023, 46(3): 37-42.
[3]	杨青青, 陈剑, 彭艺. 基于DDPG的无人机轨迹规划及功率控制算法[J]. 北京邮电大学学报, 2023, 46(3): 43-48.
[4]	杨华, 耿烜, 孔宁. 一种采用dueling-DDQN算法的无线网络MAC协议[J]. 北京邮电大学学报, 2023, 46(3): 25-30.
[5]	公雨魏翼飞. 一种集成学习辅助DDPG的资源优化算法[J]. 北京邮电大学学报, 2023, 46(2): 29-36.
[6]	孙国玮许方敏朱瑾瑜张恒升赵成林. 算力网络中的确定性调度与路由联合智能优化方案[J]. 北京邮电大学学报, 2023, 46(2): 9-14.
[7]	郭兴康孙君. 基于交替方向乘子法与深度强化学习算法的资源分配[J]. 北京邮电大学学报, 2022, 45(6): 126-130.
[8]	郭令奇褚智贤廖建新王敬宇陆璐. 意图驱动的自智网络资源按需服务[J]. 北京邮电大学学报, 2022, 45(6): 85-91.
[9]	郅佳琳, 王楠, 满毅, 滕颖蕾. 面向硬件感知的边缘计算卸载和资源分配[J]. 北京邮电大学学报, 2022, 45(2): 22-28.
[10]	张天魁, 王筱斐, 杨立伟, 杨鼎成. 移动网络SFC部署与计算资源分配联合算法[J]. 北京邮电大学学报, 2021, 44(1): 7-13.
[11]	管婉青, 张海君, 路兆铭. 基于DRL的6G多租户网络切片智能资源分配算法[J]. 北京邮电大学学报, 2020, 43(6): 132-139.
[12]	马庆刘, 喻鹏, 吴佳慧, 熊翱, 颜拥. 基于深度强化学习的综合能源业务通道优化机制[J]. 北京邮电大学学报, 2020, 43(2): 87-93.
[13]	薛宁, 霍如, 曾诗钦, 汪硕, 黄韬. 基于DRL的MEC任务卸载与资源调度算法[J]. 北京邮电大学学报, 2019, 42(6): 64-69,104.
[14]	陈湘军, 阮雅端, 陈启美, 叶飞跃. 车辆图像稀疏特征表示及其监控视频应用[J]. 北京邮电大学学报, 2016, 39(s1): 81-86.