北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2024, Vol. 47 ›› Issue (2): 0-0.

• 论文 •    

基于视觉语言提示学习的少样本图像分类方法

李宝安,王欣宇,滕尚志,吕学强   

  1. 北京信息科技大学
  • 收稿日期:2023-01-17 修回日期:2023-05-23 出版日期:2024-04-28 发布日期:2024-01-24
  • 通讯作者: 滕尚志 E-mail:tengshangzhi@bistu.edu.cn
  • 基金资助:
    国家自然科学基金项目;国家自然科学基金项目;北京市自然科学基金项目;国家语言文字工作委员会科研项目;北京市教委科技一般项目

Visual Language Learning for Few-Shot Image Classification

  • Received:2023-01-17 Revised:2023-05-23 Online:2024-04-28 Published:2024-01-24

摘要: 提出了一种充分利用大规模视觉语言预训练模型高效处理少样本图像分类问题的方法。首先,在文本编码部分,集成多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次在图像编码部分,加入可学习的视觉提示,使得图像预训练参数可以更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,进而使网络在少样本图像分类数据集上获得更好的性能。在10个公开数据集上进行大量的实验,结果表明,提出的方法相比于现有方法获得了显著改进,例如,单样本分类的平均准确度提高了2.9%。

关键词: 提示学习, 视觉语言模型, 少样本学习, 图像分类, 预训练模型

Abstract: This paper proposes a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model. Firstly, in the text encoding part, multiple learnable text s are to be integrated. The purpose is to fully explore the influence of image categories in different positions in the sentence on the generalization performance of the model. Secondly, a learnable visual is added in the image coding part to make the image pre-training parameters better represent the image with few samples. Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification dataset, so that the network can achieve better performance on the few-shot image classification datasets. Extensive experimental results on 10 public datasets show that the proposed method has a significant performance improvement compared to other existing methods. For example, the average accuracy of single-sample classification is increased by 2.9%.

Key words: learning, visual-language model, few-shot learning, image classification, pre-trained model

中图分类号: