北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2006, Vol. 29 ›› Issue (4): 77-80.doi: 10.13190/jbupt.200604.77.zhangl

• 研究报告 • 上一篇    下一篇

基于边界样本的训练样本选择方法

张 莉1,2 , 郭 军1   

  1. 1. 北京邮电大学 信息工程学院,北京100876; 2.对外经济贸易大学 信息学院,北京100029
  • 收稿日期:2005-03-29 修回日期:1900-01-01 出版日期:2006-08-30 发布日期:2006-08-30
  • 通讯作者: 张莉

A Method for the Selection of Training Samples Based on Boundary Samples

ZHANG Li1,2 , GUO Jun1   

  1. 1. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. School of Information Technology and Management Engineering, University of International Business and Economics, Beijing 100029, China
  • Received:2005-03-29 Revised:1900-01-01 Online:2006-08-30 Published:2006-08-30
  • Contact: ZHANG Li

摘要:

以入侵检测系统中的分类器设计为例,研究分类器训练样本选择问题。提出了一种大规模数据集的训练样本选择方法,首先通过聚类将训练数据划分成不同的子集缩小搜索范围;然后根据聚类内离散度和样本的覆盖区域选择样本,保留每个聚类的边界样本,删除内部样本。 即保留了典型样本,减少了训练样本数量,从而保证分类器的性能并且训练效率较高。

关键词: 样本选择, 离散度, 覆盖区域, 边界样本

Abstract:

Taking the example of designing classifier in intrusion detection system, the selection of training samples for classifier is studied. A new method is proposed for sample selection in large data set. First, it will reduce the size of selection problem via clustering, select samples according to the with-in cluster scatter value and coverage area of a sample. And it will retain boundary samples and discard most of the interior ones in each cluster. As reserving typical samples and reducing training samples, the generalization performance and training efficient of the classifier are guaranteed.

Key words: sample selection, scatter, coverage area, boundary samples

中图分类号: