[1] Dai A M, Le Q V. Semi-supervised sequence learning[C]//Advances in Neural Information Processing Systems. Montréal, Canada:[s.n.], 2015:3079-3087.
[2] Howard J, Ruder S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2018:328-339.
[3] Peters M, Ammar W, Bhagavatula C, et al. Semi-supervised sequence tagging with bidirectional language models[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:1756-1765.
[4] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of NAACL-HLT. New Orleans:[s.n.], 2018:2227-2237.
[5] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11)[2019-06-17]. https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
[6] Devlin J, Chang M, Lee K, et al. Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-01-15)[2019-06-17]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[7] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Long Beach, America:[s.n.], 2017:5998-6008.
[8] Liu P J, Saleh M, Pot E, et al. Generating wikipedia by summarizing long sequences[C]//Sixth International Conference on Learning Representations Vancouver. Canada:[s.n.], 2018:557-573.
[9] Kitaev N, Klein D. Constituencyparsing with a self-attentive encoder[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg. PA, USA:Association for Computational Linguistics, 2018:2676-2686.
[10] Suzuki J, Isozaki H. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data[C]//Proceedings of ACL-08:HLT. Columbus, Ohio:[s.n.], 2008:665-673.
[11] Kamal N, Andrew M, Tom M. Semi-supervised text classification using EM[M]//Semi-Supervised Learning. Boston:The MIT Press, 2006:32-55.
[12] Liang P. Semi-supervised learning for natural language[D]. Massachusetts:Massachusetts Institute of Technology, 2005.
[13] Chen Danqi, Manning C. A fast and accurate dependency parser using neural networks[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA:Association for Computational Linguistics, 2014:740-750.
[14] Qi Ye, Sachan D, Felix M, et al. When and why are pre-trained word embeddings useful for neural machine translation?[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 2(Short Papers), Stroudsburg, PA, USA:Association for Computational Linguistics, 2018:529-535.
[15] Logeswaran L, Lee H. An efficient framework for learning sentence representations[C]//Sixth International Conference on Learning Representations Vancouver. Canada:[s.n.], 2018:1884-1891.
[16] Erhan D, Bengio Y, Courville A, et al. Why does unsupervised pre-training help deep learning?[J]. Journal of Machine Learning Research, 2010, 11(2):625-660.
[17] Hochreiter S, Schmidhuber J. Long short-term memory[J]. NeuralComputation, 1997, 9(8):1735-1780.
[18] Min S, Seo M, Hajishirzi H. Question answering through transfer learning from large fine-grained supervision data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:510-517.
[19] Severyn A, Moschitti A. UNITN:training deep convolutional neural network for twitter sentiment classification[C]//Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Stroudsburg, PA, USA:Association for Computational Linguistics, 2015:464-469.
[20] Sennrich R, Haddow B, Birch A. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2016:86-96.
[21] Mou Lili, Meng Zhao, Yan Rui, et al. How transferable are neural networks in NLP applications?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2016:479-489.
[22] Adi Y, Kermany E, Belinkov Y, et al. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks[J]. arXiv preprint arXiv:1608. 04207. 2016.
[23] Conneau A, Kruszewski G A N, Lample G, et al. What you can cram into a single vector:Probing sentence embeddings for linguistic properties[J]. CoRR, 2018:01070.
[24] Linzen T, Dupoux E, Goldberg Y. Assessing the ability of LSTMs to learn syntax-sensitive dependencies[J]. Transactions of the Association for Computational Linguistics, 2016(4):521-535.
[25] Gulordava K, Bojanowski P, Grave E, et al. Colorless green recurrent networks dream hierarchically[C]//Proceedings of NAACL-HLT. New Orleans, Louisiana:[s.n.], 2018:1195-1205.
[26] Kingma D P, Ba J L. Adam:Amethod for stochastic optimization[C]//the 3rd International Conference on Learning Representations. San Diego, America:[s.n.], 2015:351-365.
[27] Zhu Yukun, Kiros R, Zemel R, et al. Aligning books and movies:towards story-like visual explanations by watching movies and reading books[C]//2015 IEEE International Conference on Computer Vision (ICCV). New York:IEEE Press, 2015:19-27.
[28] Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[C]//Advances in Neural Information Processing Systems. Montréal Canada:[s.n.], 2014:3320-3328.
[29] Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Portland, Oregon:[s.n.], 2011:142-150.
[30] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Advances in Neural Information Processing Systems. Montréal Canada:[s.n.], 2015:649-657.
[31] Voorhees E M, Tice D M. The TREC-8 question answering track evaluation[C]//TREC. Citeseer. Gaithersburg:[s.n.], 1999:82.
[32] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607. 06450. 2016.
[33] Mccann B, Bradbury J, Xiong C, et al. Learned in translation:contextualized word vectors[C]//Advances in Neural Information Processing Systems. Long Beach, USA:[s.n.], 2017:6294-6305.
[34] Zhang Y, Wallace B. A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing. Taipei:[s.n.], 2017:253-263.
[35] Johnson R, Zhang Tong. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2017:562-570.
[36] Tai Kaisheng, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2015:1556-1566. |