北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2021, Vol. 44 ›› Issue (1): 97-103.doi: 10.13190/j.jbupt.2020-006

• 研究报告 • 上一篇    下一篇

基于Sumtree DDPG的智能交通信号控制算法

黄浩1,3,4, 胡智群2, 王鲁晗1,3,4, 路兆铭1,3,4, 温向明1,3,4   

  1. 1. 北京邮电大学 信息与通信工程学院, 北京 100876;
    2. 湖北大学 计算机与信息工程学院, 武汉 430062;
    3. 北京邮电大学 网络体系构建与融合北京市重点实验室, 北京 100876;
    4. 北京邮电大学 先进信息网络北京实验室, 北京 100876
  • 收稿日期:2020-01-17 出版日期:2021-02-28 发布日期:2021-09-30
  • 通讯作者: 胡智群(1989-),女,副教授,硕士生导师,E-mail:zhiqunhu520@163.com. E-mail:zhiqunhu520@163.com
  • 作者简介:黄浩(1997-),男,博士生.
  • 基金资助:
    国家自然科学基金项目(61901163);北京市科技新星计划项目(Z191100001119028)

Intelligent Traffic Signal Control Algorithm Based on Sumtree DDPG

HUANG Hao1,3,4, HU Zhi-qun2, WANG Lu-han1,3,4, LU Zhao-ming1,3,4, WEN Xiang-ming1,3,4   

  1. 1. School of Information and Communications Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. School of Computer and Information Engineering, Hubei University, Wuhan 430062, China;
    3. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    4. Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2020-01-17 Online:2021-02-28 Published:2021-09-30

摘要: 提出了基于和树—深度确定性策略梯度(Sumtree DDPG)的多路口智能交通信号控制算法,通过对交叉路口数据的实时观测,智能地调控交通信号周期时长、相位顺序以及相位持续时间,提高路口通行效率.同时,基于和树结构的经验数据存储模式提高采样效率,加速了算法收敛.仿真结果表明,在动态环境下,该算法在车辆排队长度、车辆等待时间、车辆平均速度等性能指标上均优于现有的固定配时方案和基于流量权重的配时算法.

关键词: 智能交通, 交通信号控制, 深度强化学习, 深度确定性策略梯度, 多路口

Abstract: A multi-intersection intelligent traffic signal control algorithm based on sumtree deep deterministic policy gradient(Sumtree DDPG)is proposed. Through real-time observation of intersection data,the cycle length,phase sequence and phase duration of the traffic signal can be intelligently adjusted to improve the efficiency of intersections. Meanwhile,the empirical data storage mode based on sumtree structure can improve the sampling efficiency and accelerate the algorithm convergence. Compared with fixed signal timing and signal timing algorithm based on traffic flow weight,a simulation is carried out that the proposed algorithm obtains good performance in vehicle queue length,vehicle waiting time and vehicle average speed in dynamic environment.

Key words: smart transportation, traffic signal control, deep reinforcement learning, deep deterministic policy gradient, multiple intersections

中图分类号: