Based on Multiple Probing Tasks Fine-Tuning of Language Models for Text Classification

doi:10.13190/j.jbupt.2019-149

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2019, Vol. 42 ›› Issue (6): 76-83.doi: 10.13190/j.jbupt.2019-149

• Papers • Previous Articles Next Articles

Based on Multiple Probing Tasks Fine-Tuning of Language Models for Text Classification

FU Qun-chao, WANG Cong

1. School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Key Laboratory of Trustworthy Distributed Computing and Service(Beijing University of Posts and Telecommunications), Ministry of Education, Beijing 100876, China

Received:2019-11-22 Online:2019-12-28 Published:2019-11-15

Abstract

Abstract: Pre-trained language models are widely used in many natural language processing tasks, but there is no fine-tuning for different tasks. Therefore, for text classification task, the author proposes a method of fine-tuning language model based on probing task, which utilizes the specific linguistic knowledge of probing task training model, and improves the performance of the model in text classification task. Six probing tasks are given to cover the shallow information of sentences, grammar and semantics. The method is shown validated on six text classification datasets, and classification error rate is improved.

Key words: probing task, language model, multiple task, text classification

CLC Number:

TN929.53

FU Qun-chao, WANG Cong. Based on Multiple Probing Tasks Fine-Tuning of Language Models for Text Classification[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(6): 76-83.

References

[1] Dai A M, Le Q V. Semi-supervised sequence learning[C]//Advances in Neural Information Processing Systems. Montréal, Canada:[s.n.], 2015:3079-3087.
[2] Howard J, Ruder S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56^th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2018:328-339.
[3] Peters M, Ammar W, Bhagavatula C, et al. Semi-supervised sequence tagging with bidirectional language models[C]//Proceedings of the 55^th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:1756-1765.
[4] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of NAACL-HLT. New Orleans:[s.n.], 2018:2227-2237.
[5] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11)[2019-06-17]. https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
[6] Devlin J, Chang M, Lee K, et al. Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-01-15)[2019-06-17]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[7] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Long Beach, America:[s.n.], 2017:5998-6008.
[8] Liu P J, Saleh M, Pot E, et al. Generating wikipedia by summarizing long sequences[C]//Sixth International Conference on Learning Representations Vancouver. Canada:[s.n.], 2018:557-573.
[9] Kitaev N, Klein D. Constituencyparsing with a self-attentive encoder[C]//Proceedings of the 56^th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg. PA, USA:Association for Computational Linguistics, 2018:2676-2686.
[10] Suzuki J, Isozaki H. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data[C]//Proceedings of ACL-08:HLT. Columbus, Ohio:[s.n.], 2008:665-673.
[11] Kamal N, Andrew M, Tom M. Semi-supervised text classification using EM[M]//Semi-Supervised Learning. Boston:The MIT Press, 2006:32-55.
[12] Liang P. Semi-supervised learning for natural language[D]. Massachusetts:Massachusetts Institute of Technology, 2005.
[13] Chen Danqi, Manning C. A fast and accurate dependency parser using neural networks[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA:Association for Computational Linguistics, 2014:740-750.
[14] Qi Ye, Sachan D, Felix M, et al. When and why are pre-trained word embeddings useful for neural machine translation?[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 2(Short Papers), Stroudsburg, PA, USA:Association for Computational Linguistics, 2018:529-535.
[15] Logeswaran L, Lee H. An efficient framework for learning sentence representations[C]//Sixth International Conference on Learning Representations Vancouver. Canada:[s.n.], 2018:1884-1891.
[16] Erhan D, Bengio Y, Courville A, et al. Why does unsupervised pre-training help deep learning?[J]. Journal of Machine Learning Research, 2010, 11(2):625-660.
[17] Hochreiter S, Schmidhuber J. Long short-term memory[J]. NeuralComputation, 1997, 9(8):1735-1780.
[18] Min S, Seo M, Hajishirzi H. Question answering through transfer learning from large fine-grained supervision data[C]//Proceedings of the 55^th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:510-517.
[19] Severyn A, Moschitti A. UNITN:training deep convolutional neural network for twitter sentiment classification[C]//Proceedings of the 9^th International Workshop on Semantic Evaluation (SemEval 2015). Stroudsburg, PA, USA:Association for Computational Linguistics, 2015:464-469.
[20] Sennrich R, Haddow B, Birch A. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54^th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2016:86-96.
[21] Mou Lili, Meng Zhao, Yan Rui, et al. How transferable are neural networks in NLP applications?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2016:479-489.
[22] Adi Y, Kermany E, Belinkov Y, et al. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks[J]. arXiv preprint arXiv:1608. 04207. 2016.
[23] Conneau A, Kruszewski G A N, Lample G, et al. What you can cram into a single vector:Probing sentence embeddings for linguistic properties[J]. CoRR, 2018:01070.
[24] Linzen T, Dupoux E, Goldberg Y. Assessing the ability of LSTMs to learn syntax-sensitive dependencies[J]. Transactions of the Association for Computational Linguistics, 2016(4):521-535.
[25] Gulordava K, Bojanowski P, Grave E, et al. Colorless green recurrent networks dream hierarchically[C]//Proceedings of NAACL-HLT. New Orleans, Louisiana:[s.n.], 2018:1195-1205.
[26] Kingma D P, Ba J L. Adam:Amethod for stochastic optimization[C]//the 3^rd International Conference on Learning Representations. San Diego, America:[s.n.], 2015:351-365.
[27] Zhu Yukun, Kiros R, Zemel R, et al. Aligning books and movies:towards story-like visual explanations by watching movies and reading books[C]//2015 IEEE International Conference on Computer Vision (ICCV). New York:IEEE Press, 2015:19-27.
[28] Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[C]//Advances in Neural Information Processing Systems. Montréal Canada:[s.n.], 2014:3320-3328.
[29] Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49^th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Portland, Oregon:[s.n.], 2011:142-150.
[30] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Advances in Neural Information Processing Systems. Montréal Canada:[s.n.], 2015:649-657.
[31] Voorhees E M, Tice D M. The TREC-8 question answering track evaluation[C]//TREC. Citeseer. Gaithersburg:[s.n.], 1999:82.
[32] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607. 06450. 2016.
[33] Mccann B, Bradbury J, Xiong C, et al. Learned in translation:contextualized word vectors[C]//Advances in Neural Information Processing Systems. Long Beach, USA:[s.n.], 2017:6294-6305.
[34] Zhang Y, Wallace B. A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing. Taipei:[s.n.], 2017:253-263.
[35] Johnson R, Zhang Tong. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55^th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:562-570.
[36] Tai Kaisheng, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53^rd Annual Meeting of the Association for Computational Linguistics and the 7^th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2015:1556-1566.

Based on Multiple Probing Tasks Fine-Tuning of Language Models for Text Classification

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics

Comments

[1]	. Traditional Chinese Medicine Symptom Normalization Approach Based on Pre-trained Language Models [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(4): 14-20.
[2]	XIE Yonghong, TAO Hu, JIA Qi, YANG Shibing, HAN Xinliang. Traditional Chinese Medicine Symptom Normalization Approach Based on Pre-Trained Language Models [J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(4): 13-18,57.
[3]	LI Jian-gui, LIANG Yue, GAO Peng-fei, LIU Shao-hua, MA Ying-long. A Hierarchical Category Embedding Based Approach for Fault Classification of Power ICT System [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(4): 34-40.
[4]	LIU Yi-chen, SUN Hua-zhi, MA Chun-mei, JIANG Li-fen, ZHONG Chang-hong. Commodity Classification of Online Based on High-Level Feature Fusion [J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 98-104,117.
[5]	MO Qi, WANG Xiao-jie. Combining Text Classification and Text Matching for FAQ-Based Question Answering [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(4): 76-81.
[6]	LI Jing-lei, YANG Qing-hai. Node-Selfishness Management in Multi-Task Automatic Networks [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2015, 38(s1): 111-115.
[7]	WEN Juan,WANG Xiao-jie. Chinese Frequent String Extraction and Application on Language Model [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2009, 32(5): 10-14.
[8]	ZHANG Jin1,3, LI Guang2, CAO Wu4 , HU Rui-fen1 . Research on Auto Text Classification Model Based on PCA [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2006, 29(s2): 136-138.
[9]	Lv Lin，LIU Yushu，LIU Yan. Realizing English Text Classification with Semantic Set Index Method [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2006, 29(2): 18-21.