北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (6): 155-161.doi: 10.13190/j.jbupt.2019-057

• 研究报告 • 上一篇    下一篇

全卷积神经结构的段落式图像描述算法

李睿凡1,2, 梁昊雨1, 冯方向1, 张光卫2,3, 王小捷1,2   

  1. 1. 北京邮电大学 计算机学院, 北京 100876;
    2. 教育部信息网络工程研究中心, 北京 100876;
    3. 北京邮电大学 网络技术研究院, 北京 100876
  • 收稿日期:2019-04-14 出版日期:2019-12-28 发布日期:2019-11-15
  • 作者简介:李睿凡(1975-),男,副教授,E-mail:rfli@bupt.edu.cn.
  • 基金资助:
    国家重点研发计划项目(2019YFF0303302);国家自然科学基金项目(61906018);国家电网公司总部科技项目(5200-201918255A-0-0-00)

Paragraph Image Captioning with Deep Fully Convolutional Neural Networks

LI Rui-fan1,2, LIANG Hao-yu1, FENG Fang-xiang1, ZHANG Guang-wei2,3, WANG Xiao-jie1,2   

  1. 1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Engineering Research Center of Information Networks, Ministry of Education, Beijing 100876, China;
    3. Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-04-14 Online:2019-12-28 Published:2019-11-15

摘要: 针对段落式图像描述生成研究中提升描述语句之间的连贯性问题,提出了一种基于全卷积结构的图像段落描述算法.采用基于卷积网络的区域检测器获取图像表示,结合段落在语言学角度的层次性,构建一种层次性的深度卷积解码器对图像表示解码,自动生成段落式文本描述.同时将门控机制嵌入卷积解码器网络中,以提升模型的记忆能力.实验结果表明,相比于基于循环神经网络等传统段落图像的描述方法,新算法能够为图像生成更为连贯的段落式文本描述,在评测指标上取得较好的结果.

关键词: 卷积网络, 深度学习, 图像描述, 连贯性

Abstract: How to improve the coherence among descriptive sentences for the paragraph image captioning is paid attention currently. A fully convolutional neural architecture for paragraph image captioning was proposed. An image representation is first obtained using a region detector based on a convolutional network. Then a hierarchical deep convolutional decoder is constructed to translate the image representation, automatically generating a paragraph text description. In addition, the gating mechanism is embedded in the convolutional decoder network to improve memory capacity of the model. Experiments demonstrate that compared with those traditional methods based on recurrent neural networks, the proposed algorithm can generate more coherent paragraph text descriptions for images, achieving better results on evaluation metrics.

Key words: convolutional networks, deep learning, image captioning, coherence

中图分类号: