Cross-modal Retrieval Algorithm for Image and Text Based on Pre-trained Models and Encoders

Journal of Beijing University of Posts and Telecommunications ›› 2023, Vol. 46 ›› Issue (5): 112-117.

Cross-modal Retrieval Algorithm for Image and Text Based on Pre-trained Models and Encoders

Received:2023-07-15 Revised:2023-08-13 Online:2023-10-28 Published:2023-11-03

Abstract

Abstract: With the advent of the Internet era, the amount of image and text data on the web has grown exponentially. How to efficiently and accurately retrieve the information people need from massive amounts of data is a pressing issue. At present, the mainstream image-text cross-modal retrieval model architectures are mainly based on dual encoders or fusion encoders. The former encodes the image and text respectively, and then calculates the similarity distance between the image and text vectors, although the retrieval efficiency is high, the accuracy is insufficient. The latter obtains the similarity score between images and texts by jointly encoding the data of images and texts, which has high retrieval accuracy but low efficiency. In order to solve the problems of the above model architecture, this paper proposes a cross-modal image retrieval algorithm based on pre-trained model and encoder. Firstly, a recall sequencing strategy is proposed, which uses dual encoder to achieve rough recall and fusion encoder to achieve precise sequencing. Secondly, a method to build dual encoders and fusion encoders based on multi-channel Transformer pre-trained model is proposed to achieve high-quality semantic alignment between texts and images and improve retrieval performance. Experiments on two public datasets MSCOCO and Flickr30k demonstrate the effectiveness of the proposed algorithm.

Key words: Cross-modal retrieval algorithm, pre-trained model, dual encoders, fusion encoders.

CLC Number:

TP391

[1]	. Miao Costume Image Segmentation Based on Generalized Enhanced Interval Type-2 Fuzzy C-means [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(6): 0-0.
[2]	. Medical image segmentation based on federal style transfer [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(6): 0-0.
[3]	. Microinjection Image Needle Tip Localization Method Based on Shape Perception [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(6): 0-0.
[4]	. Incompressible number density based SPH model for the simulation of silicone oil tamponade and emulsification [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(6): 0-0.
[5]	. An Interpretable Prediction Model for Heart Disease Risk Based on Improved Whale Optimized LightGBM [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(6): 0-0.
[6]	. Key Recognition Technology Based on Vibration Perception [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 132-138.
[7]	. Research On Pedestrian Detection Algorithm Based on Multi-camera Feature Fusion [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 66-71.
[8]	. Fine-grained emotion analysis of online comments based on the fusion of ontology and deep learning [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 125-131.
[9]	Qi-Wei GONG. PERC Roberta：Emotion Recognition in Conversation using ERC Roberta with Learning [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 106-111.
[10]	BAO Rongzhen, ZHU Zhiyu, YANG Yang, FENG Chunyan. Visible Light Communication Assisted Perspective Circle and Lines Positioning Algorithm Based on A Single Light Source [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(4): 32-39.
[11]	ZHANG Xiaoqian, WANG Xiao, XUE Xuqian, TAN Zhen, PU Lei. Image Segmentation Algorithm Based on Weighted Multi-Kernel Subspace Clustering [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 78-83.
[12]	WANG Ming, LIN Beibei, ZHANG Shunli. Gait Recognition Based on Frame-Level and Spatio-Temporal Double Branch Network [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 73-77.
[13]	CHANG Xiao, HUANG Zhibin, YU Min, YANG Wubing. A Deep Decision Tree Model for Aerospace Big Data [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 1-6.
[14]	. 3D Segmentation of Brain Tumor MRI Image based on RAPNet [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(2): 91-97.
[15]	. Traditional clothing image classification algorithm based on multi-layer discriminant dictionary learning [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(2): 104-108.

Cross-modal Retrieval Algorithm for Image and Text Based on Pre-trained Models and Encoders

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments