Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2024, Vol. 47 ›› Issue (4): 130-135.

Previous Articles     Next Articles

Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment

HAN Jing, ZHANG Tianpeng, LYU Xueqiang   

  1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University
  • Received:2023-07-04 Revised:2023-10-12 Online:2024-08-28 Published:2024-08-26

Abstract: Addressing the limitations of existing models in multimodal information processing, such as inadequate feature extraction and insufficient information interaction, a fine-grained image classification model is proposed, incorporating multi-modal features and enhanced alignment. A hierarchical feature adaptive fusion module is proposed to achieve multi-level adaptive fusion of multi-modal features, fully utilizing feature information of the convolutional intermediate layer and enhancing the model's ability to perceive local details of the image. Additionally, an enhanced aligned feature fusion module is proposed to improve the interaction dimension between multimodal features and make full use of the mapping relationship between different modalities. Experimental results show that the proposed model achieves excellent recognition performance on several public datasets, outperforming previous multimodal feature fusion models. Furthermore, through comparative analysis in ablation experiments, the results of individual modules are better than the original model, highlighting the effectiveness of the proposed model.

Key words: deep learning ,  fine-grained image classification ,  multimodal ,  adaptive feature fusion ,  attention mechanism

CLC Number: