Analysis Algorithm of Reference Record in HTML Page

doi:10.13190/j.jbupt.2017.s.019

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2017, Vol. 40 ›› Issue (s1): 85-88.doi: 10.13190/j.jbupt.2017.s.019

• Papers • Previous Articles Next Articles

Analysis Algorithm of Reference Record in HTML Page

ZENG Qing-tao^1,2, XIE Kai¹, LI Ye-li¹, WANG Xin-gang³, YE Yu-shan¹, MA Shao-ping²

1. School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China;
2. Postdoctoral Research Station in Computer Science and Technology, Tsinghua University, Beijing 100084, China;
3. Broadcast and Television Direct Broadcasting Satellite Management Center, The State Administration of Press, Publication, Radio, Film and Television, Beijing 100045, China

Received:2016-05-26 Online:2017-09-28 Published:2017-09-28

Abstract

Abstract: With rapid development of Internet, web pages have become the main sources of information. In order to make publishing agencies timely find necessary references from large number of pages, it is necessary to design a reference information extraction algorithm to get useful references information from hyper text markup language pages. A reference analysis algorithm based on conditional random fields was proposed. Firstly, a document object tree segmentation algorithm was designed. Through classifier the web page data were divided into separate parts,and these data blocks were composed of tags and text sequences. Subsequently, these sequences were taken as characteristic vectors of conditional random field model to establish reference information labeling model. Finally, a heuristic algorithm was presented to extract reference information data from the labeling model, and validity of this algorithm was verified by experiments.

Key words: digital publishing, conditional random field, reference analysis

CLC Number:

TP393

ZENG Qing-tao, XIE Kai, LI Ye-li, WANG Xin-gang, YE Yu-shan, MA Shao-ping. Analysis Algorithm of Reference Record in HTML Page[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2017, 40(s1): 85-88.

References

[1] 湛江. 文献检索统计中易被漏检和错误归类的高校学报[J]. 中国科技期刊研究, 2015, 26(9): 1005-1008. Zhan Jiang. The journals of universities easily missed or wrongly classified in statistical analysis[J]. Chinese Journal of Scientific and Technical Periodicals, 2015, 26(9): 1005-1008.
[2] 孙颖,崔洁爽,陈扬.关键词共现分析技术在图书馆文献检索中的应用——以心理学为我国"五位一体"战略布局服务为例[J]. 图书馆工作与研究, 2015(11): 45-49. Sun Ying, Cui Jieshuang, Chen Yang. Keywords co-occurrence analysis technology in the library literature retrieval application—to psychology for China "one of five" strategic layout of the service as an example[J]. Library Work and Study, 2015(11): 45-49.
[3] 林岚.认知弹性理论在文献检索教学中的应用[J]. 图书馆, 2010(2):119-120. Lin Lan. Application of cognitive flexibility theory on document retrieval teaching[J]. Library, 2010(2): 119-120.
[4] 张莉.文献检索方式的发展与提高期刊影响力[J]. 编辑学报, 2005, 17(2): 124-125. Zhang Li. Evolution of literature retrieval and improvement of the journal's influence[J].Acta Editologica, 2005, 17(2): 124-125.
[5] 张佳, 窦丽华, 陈杰. 科技文献检索实践课程教学的创新[J]. 实验室研究与探索, 2012, 31(2): 115-118. Zhang Jia, Dou Lihua, Chen Jie. Teaching innovation of science and technology literature retrieval[J]. Research and Exploration in Laboratory, 2012, 31(2): 115-118.
[6] 邹永利, 何侃, 徐健. 文体特征在网络学术文献检索中的意义与应用[J]. 情报理论与实践, 2008, 31(4): 594-597. Zou Yongli, He Kan, Xu Jian. The significance and application of stylistic features in network academic literature retrieval[J]. Information Studies: Theory & Application, 2008, 31(4): 594-597.
[7] 张永宏, 胡立耘.文献检索在编辑工作中的应用[J]. 编辑学报, 2001, 13(3):158-160. Zhang Yonghong, Hu Liyun. Application of knowledge of bibliography to editing[J].Acta Editologica, 2001, 13(3):158-160.
[8] 黄晓鹂,李树民, 廉立军. 我国高等院校文献检索教学研究文献分析[J]. 现代情报, 2009, 29(3):222-225. Huang Xiaoli, Li Shumin, Lian Lijun. Literature analysis of literature retrieval teaching research in Chinese university[J]. Journal of Modern Information, 2009, 29(3): 222-225.

Analysis Algorithm of Reference Record in HTML Page

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

Comments

[1]	DONG Yuan1,ZHOU Tao1,DONG Cheng-yu2,WANG Hai-la2. Prosodic Structure Prediction based on Conditional Random Field Model [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2009, 32(5): 36-40.
[2]	QIN Ying, WANG Xiao-jie, ZHONG Yi-xin . Cascade Identification of Chinese Chunks [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2008, 31(1): 14-17.
[3]	Wang Hao-chang1, Zhao Tie-jun1, Liu Yan-li2, Yu Hao1 . Intelligent Method for Name Entity Recognition from Biomedical Text [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2006, 29(s2): 54-58.