北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2016, Vol. 39 ›› Issue (5): 61-66.doi: 10.13190/j.jbupt.2016.05.013

• 论文 • 上一篇    下一篇

快速实时大规模互联网广告流量检测系统

方澄, 赵晓星, 刘军, 雷振明   

  1. 北京邮电大学 信息与通信工程学院, 北京 100876
  • 收稿日期:2016-03-29 出版日期:2016-10-28 发布日期:2016-12-02
  • 作者简介:方澄(1980-),男,博士生,E-mail:fone@bupt.edu.cn;雷振明(1951-),男,教授,博士生导师.
  • 基金资助:
    高等学校学科创新引智计划项目(B08004)

Fast and Real-Time Internet Advertisement Traffic Recognition System Applied to Massive Network Dataset

FANG Cheng, ZHAO Xiao-xing, LIU Jun, LEI Zhen-ming   

  1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2016-03-29 Online:2016-10-28 Published:2016-12-02

摘要: 提出一种适用于大规模互联网流量的实时广告流量检测系统,系统以目前最为流行的Adblock规则列表作为基本规则库,将HashTable快速匹配算法和Aho-Corasick快速匹配算法相结合,对广告流量进行快速实时匹配.此外,为了适应大规模流式数据的需求,将匹配算法部署在并行流式工作框架Spark Streaming之上.模型系统分别在实验室和运营商真实网络环境下的超大规模数据集进行了测试,结果表明,检测系统具有较高的准确率和计算效率.

关键词: 广告流量, 实时, 匹配算法, 流式工作框架, 大规模数据

Abstract: A real-time internet advertisement traffic recognition system applied to massive network dataset was proposed. The model adopts the currently most popular Adblock filter rules as the basic filter rules, and combines the HashTable fast matching algorithm as well as the Aho-Corasick fast matching algorithm to recognize the advertisement traffic in a fast and real-time way. To meet the need of the massive streaming data, the algorithms are deployed on Spark Streaming, a parallel streaming framework for solving streaming data. The model is respectively experimented with both factual data from our lab and the real massive datasets from the network operators. Experiments show that the system can achieve high precision and high calculation performance.

Key words: advertisement traffic, real-time, matching algorithm, streaming framework, massive dataset

中图分类号: