基于视觉语言提示学习的少样本图像分类方法

北京邮电大学学报 ›› 2024, Vol. 47 ›› Issue (2): 11-17.

基于视觉语言提示学习的少样本图像分类方法

李宝安,王欣宇,滕尚志,吕学强

北京信息科技大学

收稿日期:2023-01-17 修回日期:2023-05-23 出版日期:2024-04-28 发布日期:2024-02-28
通讯作者: 滕尚志 E-mail:tengshangzhi@bistu.edu.cn
基金资助:
国家自然科学基金项目;国家自然科学基金项目;北京市自然科学基金项目;国家语言文字工作委员会科研项目;北京市教委科技一般项目

Visual Language Learning for Few-Shot Image Classification

Received:2023-01-17 Revised:2023-05-23 Online:2024-04-28 Published:2024-02-28

摘要/Abstract

摘要： 为了提高少样本图像分类的性能和泛化能力,充分利用大规模视觉语言预训练模型，提出了一种高效处理少样本图像分类问题的方法。首先,在文本编码部分,整合多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次,在图像编码部分,引入可学习的视觉提示,使图像预训练参数能更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,以提升网络在少样本图像分类数据集上的性能。在 10 个公开数据集上进行了大量实验,结果表明,相较于现有方法,所提方法在单样本分类的平均准确度上提高了 2.9% 。

关键词: 提示学习, 视觉语言模型, 少样本学习, 图像分类, 预训练模型

Abstract: This paper proposes a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model. Firstly, in the text encoding part, multiple learnable text s are to be integrated. The purpose is to fully explore the influence of image categories in different positions in the sentence on the generalization performance of the model. Secondly, a learnable visual is added in the image coding part to make the image pre-training parameters better represent the image with few samples. Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification dataset, so that the network can achieve better performance on the few-shot image classification datasets. Extensive experimental results on 10 public datasets show that the proposed method has a significant performance improvement compared to other existing methods. For example, the average accuracy of single-sample classification is increased by 2.9%.

Key words: learning, visual-language model, few-shot learning, image classification, pre-trained model

中图分类号:

TP301.6

李宝安王欣宇滕尚志吕学强. 基于视觉语言提示学习的少样本图像分类方法[J]. 北京邮电大学学报, 2024, 47(2): 11-17.

[1]	韩晶张天鹏吕学强. 基于多模态特征与增强对齐的细粒度图像分类[J]. 北京邮电大学学报, 2024, 47(4): 130-135.
[2]	杨荣泰邵玉斌杜庆治龙华马迪南. 基于子图结构语义增强的少样本知识图谱补全[J]. 北京邮电大学学报, 2024, 47(4): 71-76,89.
[3]	石宇于宁孙亚伟刘建毅. 基于元多任务提示学习的零样本谣言检测方法[J]. 北京邮电大学学报, 2024, 47(4): 77-82.
[4]	林上豪刘芳芳郭彩丽仝顽杰. 基于语义重要性的语义编码算法研究[J]. 北京邮电大学学报, 2024, 47(3): 10-16.
[5]	倪水平马新良. 一种自蒸馏的轻量化图像分类网络方案[J]. 北京邮电大学学报, 2023, 46(6): 66-0.
[6]	宫岐伟禹可吴晓非. 基于 ERC Roberta 的提示学习实现对话情感识别 [J]. 北京邮电大学学报, 2023, 46(5): 106-111.
[7]	陈曦彭姣张鹏飞罗中李欧中洪. 基于预训练模型和编码器的图文跨模态检索算法[J]. 北京邮电大学学报, 2023, 46(5): 112-117.
[8]	张天魁蔡昌利骆晓亮朱禹涛. 基于多尺度特征Transformer的细粒度图像分类方法[J]. 北京邮电大学学报, 2023, 46(4): 70-75.
[9]	赵海英王梓舟. 基于多层判别字典学习的传统服饰图像分类算法[J]. 北京邮电大学学报, 2023, 46(2): 104-108.