Paragraph Image Captioning with Deep Fully Convolutional Neural Networks

doi:10.13190/j.jbupt.2019-057

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2019, Vol. 42 ›› Issue (6): 155-161.doi: 10.13190/j.jbupt.2019-057

• Reports • Previous Articles Next Articles

Paragraph Image Captioning with Deep Fully Convolutional Neural Networks

LI Rui-fan^1,2, LIANG Hao-yu¹, FENG Fang-xiang¹, ZHANG Guang-wei^2,3, WANG Xiao-jie^1,2

1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Engineering Research Center of Information Networks, Ministry of Education, Beijing 100876, China;
3. Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2019-04-14 Online:2019-12-28 Published:2019-11-15

Abstract

Abstract: How to improve the coherence among descriptive sentences for the paragraph image captioning is paid attention currently. A fully convolutional neural architecture for paragraph image captioning was proposed. An image representation is first obtained using a region detector based on a convolutional network. Then a hierarchical deep convolutional decoder is constructed to translate the image representation, automatically generating a paragraph text description. In addition, the gating mechanism is embedded in the convolutional decoder network to improve memory capacity of the model. Experiments demonstrate that compared with those traditional methods based on recurrent neural networks, the proposed algorithm can generate more coherent paragraph text descriptions for images, achieving better results on evaluation metrics.

Key words: convolutional networks, deep learning, image captioning, coherence

CLC Number:

TN309.2

LI Rui-fan, LIANG Hao-yu, FENG Fang-xiang, ZHANG Guang-wei, WANG Xiao-jie. Paragraph Image Captioning with Deep Fully Convolutional Neural Networks[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(6): 155-161.

References

[1] Vinyals O, Toshev A, Bengio S, et al. Show and tell:a neural image caption generator[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2015:3156-3164.
[2] Lu Jiasen, Xiong Caiming, Parikh D, et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2017:375-383.
[3] Mao Yuzhao, Zhou Chang, Wang Xiaojie, et al. Show and tell more:topic-oriented multi-sentence image captioning[C]//Proceedings of the 27^th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization, 2018:4258-4264.
[4] Xu K, Ba J, Kiros R, et al. Show, attend and tell:neural image caption generation with visual attention[C]//International Conference on Machine Learning. Lille, France:ACM, 2015:2048-2057.
[5] You Quanzeng, Jin Hailin, Wang Zhaowen, et al. Image captioning with semantic attention[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2016:4651-4659.
[6] Karpathy A, Li Feifei. Deep visual-semantic alignments for generating image descriptions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2015:3128-3137.
[7] Anderson P, He Xiaodong, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2018:6077-6086.
[8] Krause J, Johnson J, Krishna R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2017:317-325.
[9] Liang Xiaodan, Hu Zhiting, Zhang Hao, et al. Recurrent topic-transition GAN for visual paragraph generation[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York:IEEE Press, 2017:3362-3371.
[10] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems. Cambridge:MA, MIT Press, 2014:2672-2680.
[11] Chatterjee M, Schwing A G. Diverse and coherent paragraph generation from images[M]//Computer Vision-ECCV 2018. Cham:Springer International Publishing, 2018:747-763.
[12] Wang Z, Luo Y, Li Y, et al. Look deeper see richer:depth-aware image paragraph captioning[C]//2018 ACM Multimedia Conference. Association for Computing Machinery. New York:ACM Press, 2018:672-680.
[13] Che Wenbin, Fan Xiaopeng, Xiong Ruiqin, et al. Paragraph generation network with visual relationship detection[C]//2018 ACM Multimedia Conference on Multimedia-MM'18. New York:ACM Press, 2018:1435-1443.
[14] Dauphin Y N, Fan A, Auli M, et al. Language modeling with gated convolutional networks[C]//The 34^th International Conference on Machine Learning-Volume 70. Sydney, Australia:ACM Press, 2017:933-941.
[15] Krishna R, Zhu Yuke, Groth O, et al. Visual genome:connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017, 123(1):32-73.
[16] Chen X, Fang H, Lin T Y, et al. Microsoft COCO captions:data collection and evaluation server[J]. arXiv preprint arXiv:1504. 00325, 2015.
[17] Papineni K, Roukos S, Ward T, et al. BLEU:a method for automatic evaluation of machine translation[C]//The 40^th Annual Meeting on Association for Computational Linguistics (ACL). PA, USA:ACL, 2002:311-318.
[18] Vedantam R, Zitnick C L, Parikh D. CIDEr:consensus-based image description evaluation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York:IEEE Press, 2015:4566-4575.

Paragraph Image Captioning with Deep Fully Convolutional Neural Networks

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	LIU Yang, TENG Yinglei, NIU Tao, ZHI Jialin. Filter Pruning Algorithm Based on Deep Reinforcement Learning [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 31-36.
[2]	. Multi-Mode Guided Point Cloud Completion Method in Complex Environment [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 103-108.
[3]	RONG Zhenyu, LIU Jianyi. Retinal Blood Vessel Segmentation Based on Transformer and MLP [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(1): 26-31.
[4]	YANG Jie, ZHANG Shujie. Defect Recognition of Printed Circuit Board Based on YOLOv3-Dense [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(5): 42-48.
[5]	LIU Jintong, YANG Guoxing, LIU Xiaohong, WANG Guangyu. Smoothing Attack Algorithm Based on Electrocardiogram Classification [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(4): 44-50.
[6]	HUANG Liuting, LIU Kexin, NIU Kai, CHANG Chun, HE Zhiqiang. Automatic Recognition of Mucus Impaction in CT Images of Asthmatic Patients Using Deep Learning [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(4): 58-63.
[7]	LIU Changyuan, HU Haoyuan, BI Xiaojun. Driver Distraction Recognition Using Bilinear Fusion Networks [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(2): 79-84.
[8]	ZHAO Haiying, ZHU Hui, HOU Xiaogang. Traditional Custume Image Semantic Segmentation Based on Improved EMA Unit [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(1): 69-74.
[9]	ZHANG Binyu, ZHAO Yanyun, DU Yunhao, WAN Junfeng, TONG Zhihang. Character Detection Method for PCB Image Based on Deep Learning [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(1): 108-114.
[10]	JIA Jun, FENG Chun-yan, XIA Hai-lun, ZHANG Tian-kui, LI Cheng-gang. Communication Networks Fault Prediction Method Based on Sample Equalization and Feature Interaction [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(6): 59-66.
[11]	PU Yun-wei, GUO Jiang, LIU Tao-tao, WU Hai-xiao. A Recognition Method for Radar Emitter Signals Based on Convolutional Neural Network with Multiple Learning Units [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(6): 74-82.
[12]	WANG Yi-fei, MO Shuang, WU Wen-rui, FAN Shao-hua, XIAO Ding. Internal-External Convolutional Networks for Network Intrusion Detection [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(5): 94-100.
[13]	GAO Hui, ZHANG Ji-wei, LAI Yang, WANG Wen-dong. Deep Learning Based Semi-Automatic Labeling System for Human Images [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(1): 104-109.
[14]	PU Yue-yi, WANG Wen-han, ZHU Qiang, CHEN Peng-peng. Urban Short-Term Traffic Flow Prediction Algorithm Based on CNN-ResNet-LSTM Model [J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 9-14.
[15]	MA Xiang-liang, LI Bing, YANG Dan, HUANG Ke-zhen, DUAN Xiao-yi. Reverse-Analysis of S-Box for SM4-Like Algorithms Based on Side Channel Technology [J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 118-124.