北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2018, Vol. 41 ›› Issue (1): 13-23.doi: 10.13190/j.jbupt.2017-243

• 论文 • 上一篇    下一篇

面向Android二进制代码的缺陷预测方法

董枫, 刘天铭, 徐国爱, 郭燕慧, 李承泽   

  1. 北京邮电大学 网络空间安全学院, 北京 100876
  • 收稿日期:2017-11-30 出版日期:2018-02-28 发布日期:2018-01-04
  • 作者简介:董枫(1990-),男,博士生,E-mail:dongfeng@bupt.edu.cn;徐国爱(1972-),男,教授,博士生导师.
  • 基金资助:
    国家自然科学基金项目(61401038);2016广东省科学技术厅前沿与关键技术创新项目(2016B010110002)

Defect Prediction Method for Android Binary Files

DONG Feng, LIU Tian-ming, XU Guo-ai, GUO Yan-hui, LI Cheng-ze   

  1. School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2017-11-30 Online:2018-02-28 Published:2018-01-04

摘要: 针对Android软件缺陷预测任务中源代码难以获取的问题,提出一种面向Android二进制可执行文件的缺陷预测模型,同时采用深度神经网络进行缺陷预测.首先,通过一种创新的Android可执行文件缺陷特征提取方法,提取其符号特征和语义特征来构建缺陷特征向量;其次,用缺陷特征向量输入深度神经网络算法来训练和构建缺陷预测模型;最后,将工具原型DefectDroid应用于大规模smali文件缺陷预测任务中,在同项目缺陷预测、跨项目缺陷预测、传统机器学习算法等方面对模型进行性能评估.

关键词: 缺陷预测, 软件安全, Android二进制文件, 机器学习, 深度神经网络

Abstract: Software defect prediction is an important method in the field of software security. Most of existing defect prediction models are source-oriented and can not be easily used for Android binary files (apks) defect prediction. Moreover, the traditional machine learning techniques used in these models have a shallow architecture, which leads to a limited capacity of expressing complex functions between features and defects. The author proposes a practical defect prediction model for Android binary files using deep neural network (DNN). A new approach is proposed to generate features that capture both token and semantic features of the defective smali (decompiled files of apks) files in apks. The feature vectors are input into DNN to train and build the defect prediction model in order to achieve accuracy. The article implements the model called DefectDroid and applies it to a large number of Android smali files. The performance of DefectDroid is compared from three aspects:within-project defect prediction, cross-project defect prediction and traditional machine learning algorithms.

Key words: defect prediction, software security, Android binary files, machine learning, deep neural network

中图分类号: