基于多相机特征融合的行人检测算法

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (5): 66-71.

基于多相机特征融合的行人检测算法

叶洪滨,林政宽,程红举

福州大学

收稿日期:2022-09-14 修回日期:2022-11-14 出版日期:2023-10-28 发布日期:2023-11-03
通讯作者: 林政宽 E-mail:cklin@nycu.edu.tw

Research On Pedestrian Detection Algorithm Based on Multi-camera Feature Fusion

Received:2022-09-14 Revised:2022-11-14 Online:2023-10-28 Published:2023-11-03

摘要/Abstract

摘要： 在复杂拥挤的场景中，单眼行人检测通常因为着遮挡问题导致严重的误判。不过，通过结合多视角的数据进行多视图行人检测能够有效的解决遮挡问题。但是以往的多视图检测算法都是只采用单级特征图完成检测，这导致对多尺度目标的检测效果不佳。为了解决上述问题，提出一种新颖的多视图检测算法，采用Dilated Encoder方法进行多视图信息的聚合。Dilated Encoder通过采用不同膨胀率膨胀卷积从单层特征层中得到不同尺度感受野，从而覆盖目标的所有尺度范围，提高对多尺度目标的检测能力。最后在Wildtrack据集上的实验结果表明，多目标检测精度指标MODA最高可达90.7%。

关键词: 多视数据, 特征融和, 膨胀卷积, 复杂场景, 行人检测, 多级检测

Abstract: Monocular pedestrian detection usually suffers from occlusion problems in complex and crowded scenes, which can lead to serious false positives. Multi-view pedestrian detection can effectively solve the occlusion problem by combining data from multiple views. In the previous multi-view detection algorithms, only single-level feature maps are used, which cannot detect multi-scale targets well. In this paper we propose a new multi-view detection algorithm which uses a newly introduced Dilated Encoder method to aggregate the information of multiple views. Dilated Encoder is a method that uses different dilated convolutions of the expansion rate so that a single layer of features gets different scale perceptual fields, covering all scale ranges of the target and improving the capability of multi-scale targets. Our proposed method achieves 90.7% MODA on the Wildtrack dataset, which is a very strong competitive result compared to the current state-of-the-art algorithms.

Key words: multi-view data, feature fusion, dilated convolution, crowded scene, pedestrian detection, multi-scale detection

中图分类号:

TP391.41

叶洪滨林政宽程红举. 基于多相机特征融合的行人检测算法[J]. 北京邮电大学学报, 2023, 46(5): 66-71.