Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2019, Vol. 42 ›› Issue (6): 155-161.doi: 10.13190/j.jbupt.2019-057

• Reports • Previous Articles     Next Articles

Paragraph Image Captioning with Deep Fully Convolutional Neural Networks

LI Rui-fan1,2, LIANG Hao-yu1, FENG Fang-xiang1, ZHANG Guang-wei2,3, WANG Xiao-jie1,2   

  1. 1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Engineering Research Center of Information Networks, Ministry of Education, Beijing 100876, China;
    3. Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-04-14 Online:2019-12-28 Published:2019-11-15

Abstract: How to improve the coherence among descriptive sentences for the paragraph image captioning is paid attention currently. A fully convolutional neural architecture for paragraph image captioning was proposed. An image representation is first obtained using a region detector based on a convolutional network. Then a hierarchical deep convolutional decoder is constructed to translate the image representation, automatically generating a paragraph text description. In addition, the gating mechanism is embedded in the convolutional decoder network to improve memory capacity of the model. Experiments demonstrate that compared with those traditional methods based on recurrent neural networks, the proposed algorithm can generate more coherent paragraph text descriptions for images, achieving better results on evaluation metrics.

Key words: convolutional networks, deep learning, image captioning, coherence

CLC Number: