Fine-Grained Image
Classification Based on Multi-Modal Features and Enhanced Alignment

Journal of Beijing University of Posts and Telecommunications ›› 2024, Vol. 47 ›› Issue (4): 130-135.

Previous Articles Next Articles

Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment

^{HAN Jing, ZHANG
Tianpeng, LYU Xueqiang}

Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University

Received:2023-07-04 Revised:2023-10-12 Online:2024-08-28 Published:2024-08-26

Abstract

Abstract: Addressing the limitations of existing models in multimodal information processing, such as inadequate feature extraction and insufficient information interaction, a fine-grained image classification model is proposed, incorporating multi-modal features and enhanced alignment. A hierarchical feature adaptive fusion module is proposed to achieve multi-level adaptive fusion of multi-modal features, fully utilizing feature information of the convolutional intermediate layer and enhancing the model's ability to perceive local details of the image. Additionally, an enhanced aligned feature fusion module is proposed to improve the interaction dimension between multimodal features and make full use of the mapping relationship between different modalities. Experimental results show that the proposed model achieves excellent recognition performance on several public datasets, outperforming previous multimodal feature fusion models. Furthermore, through comparative analysis in ablation experiments, the results of individual modules are better than the original model, highlighting the effectiveness of the proposed model.

Key words: deep learning , fine-grained image classification , multimodal , adaptive feature fusion , attention mechanism

CLC Number:

TP391.41

HAN Jing, ZHANG Tianpeng, LYU Xueqiang. Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment[J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(4): 130-135.

[1]	LI Yunhong, ZHU Jingkun, LIU Xingrui, CHEN Jinni, SU Xueping. Anime Image Style Transfer Algorithm Based on Improved Generative Adversarial Networks [J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(4): 117-123.
[2]	. Research on image defogging algorithm based on dark channel prior and particle swarm optimization [J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(2): 118-122.
[3]	. Research On Pedestrian Detection Algorithm Based on Multi-camera Feature Fusion [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 66-71.
[4]	CHANG Xiao, HUANG Zhibin, YU Min, YANG Wubing. A Deep Decision Tree Model for Aerospace Big Data [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(3): 1-6.
[5]	. 3D Segmentation of Brain Tumor MRI Image based on RAPNet [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(2): 91-97.
[6]	RONG Zhenyu, LIU Jianyi. Retinal Blood Vessel Segmentation Based on Transformer and MLP [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(1): 26-31.
[7]	ZHANG Xiaoqian, TAN Zhen, WANG Xiao, LIANG Qin, WAN Liming. Image Feature Extraction Algorithm Based on Orthogonal Projection Learning [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(5): 85-90,128.
[8]	YE Kang, ZHANG Shujun, GUO Qi, LI Hui, CUI Xuehong. Continuous Sign Language Recognition Based on CM-Transformer [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(5): 49-53,78.
[9]	WANG Chen-kui, CHEN Yue-lin, CAI Xiao-dong. Person Re-Identification Method Based on Image Style Transfer [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(3): 67-72.
[10]	BIAN Ji-long, WANG Hou-bo, LI Jin-feng. Stereo Matching Method Based on Multiscale Attention Network [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(3): 27-34.
[11]	WANG Rui-nan, WU Mu-qing, CHEN Tie-ying, LI Zhi-gang, SUN Jian. Edge Detection of Infrared Image Based on Morphology [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(1): 66-71.
[12]	XIONG Guang-zheng, HUANG Zhi-bin, DAI Zhi-tao, YANG Wu-bing. A Data Driven Characteristically Filtering Method for 3D Flow Field [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(6): 91-97.
[13]	ZHAO Shan, LI Yong-si. Imaging Hashing Based on Principal Component Analysis [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(2): 36-41.
[14]	LI Zhe, LI Jian-zeng, WANG Zhe. Consistent Blur Blind Restoration Algorithm Based on Prior Optimization [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(2): 63-69.
[15]	ZOU Guo-feng, FU Gui-xia, WANG Ke-jun, GAO Ming-liang, SHEN Jin. Construction Method of Adaptive Deep Convolutional Neural Network Model [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2017, 40(4): 98-103.

Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments