Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2024, Vol. 47 ›› Issue (4): 36-43.

Previous Articles     Next Articles

The Instruction Tuning of Large Language Models with Multi-Modal Recommendation Instruction

HAO Bowen1,3, LIU Yifei2, LI Liyao3, WANG Jie1, PENG Yan1   

  • Received:2023-12-19 Revised:2024-01-17 Online:2024-08-28 Published:2024-08-26

Abstract: The tuning of large language models based on multimodal instructions has been proven effective in endowing large language models with the capability to address relevant multimodal tasks. To further empower large language models in handling multimodal zero-shot or few-shot recommendation tasks, multi-modal recommendation of large language model is proposed, which is built upon the foundation of ChatGLM2-6B, and is trained on multimodal recommendation dataset that includes both textual and image information. The construction of multimodal user profiles and item attributes is achieved through the utilization of ChatGPT and GPT-4 for generating instructions. Additionally, instructions for zero-shot and few-shot recommendations are formulated. The model undergoes efficient parameter fine-tuning using the P-tuning v2 method, requiring only a single A100 40GB graphics processing unit for the fine-tuning process. Experimental results demonstrate that the proposed model significantly outperforms existing baseline models.

Key words: multimodal recommendation instructions, large language model, instruction tuning

CLC Number: