Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2024, Vol. 47 ›› Issue (2): 11-17.

Previous Articles     Next Articles

Visual Language Learning for Few-Shot Image Classification

  

  • Received:2023-01-17 Revised:2023-05-23 Online:2024-04-28 Published:2024-01-24

Abstract: This paper proposes a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model. Firstly, in the text encoding part, multiple learnable text s are to be integrated. The purpose is to fully explore the influence of image categories in different positions in the sentence on the generalization performance of the model. Secondly, a learnable visual is added in the image coding part to make the image pre-training parameters better represent the image with few samples. Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification dataset, so that the network can achieve better performance on the few-shot image classification datasets. Extensive experimental results on 10 public datasets show that the proposed method has a significant performance improvement compared to other existing methods. For example, the average accuracy of single-sample classification is increased by 2.9%.

Key words: learning, visual-language model, few-shot learning, image classification, pre-trained model

CLC Number: