北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2020, Vol. 43 ›› Issue (4): 76-82.doi: 10.13190/j.jbupt.2019-174

• 研究报告 • 上一篇    下一篇

一种基于PPI网络的乳腺癌差异基因分析算法

王小玉, 冯阳   

  1. 哈尔滨理工大学 计算机科学技术学院, 哈尔滨 150080
  • 收稿日期:2019-08-29 发布日期:2020-08-15
  • 作者简介:王小玉(1971-),女,教授,E-mail:wangxiaoyu@hrbust.edu.cn.
  • 基金资助:
    国家自然科学基金项目(60572153,60972127);黑龙江省教育厅科学技术项目(12541177)

An Algorithm for Differential Gene Analysis of Breast Cancer Based on PPI Network

WANG Xiao-yu, FENG Yang   

  1. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Received:2019-08-29 Published:2020-08-15

摘要: 为了提高对于乳腺癌差异基因筛选的准确率,从分子层面出发,结合拷贝数与基因表达两方面特征,分析了乳腺癌差异表达基因,研究了乳腺癌的发病机制,为乳腺癌的诊疗提供了新的研究思路.在癌症基因组图谱中下载乳腺癌的拷贝数和基因表达数据,利用R软件通过卡方检验提取乳腺癌拷贝数差异基因,结合edgeR差异基因分析算法,筛选乳腺癌差异表达基因,利用ks检验关联两方面差异基因,分析其相关性,结合string数据库构造蛋白质互作网络,筛选核心基因,通过生存分析和GO富集分析验证结果的准确性.以基因差异表达倍数大于1,p值小于0.05为标准,筛选出基因表达差异基因共有10 579个,上调基因7 543个,下调基因3 036个,经验证发现,ATAD2B等8个基因与乳腺癌的发生发展密切相关.

关键词: 卡方检验, edgeR算法, ks检验, 蛋白质互作网络, 生存分析

Abstract: In order to improve the accuracy of screening differential genes for breast cancer,the differential expression genes of breast cancer was analyzed from the molecular level,combined with the characteristics of copy number and gene expression,studied the pathogenesis of breast cancer and provided new research ideas for the diagnosis and treatment of breast cancer. The cancer genome atlas database was used to download copy number and gene expression data of breast cancer, chi square test was used to extract copy number difference genes of breast cancer. Through R software, edgeR differential gene analysis algorithm was used to screen differentially expressed genes in breast cancer, ks test was used to correlate two differentially expressed genes to analyze the relationship between CNV variation and gene expression, string database was used to construct protein interaction network to screen core genes, the accuracy of the results was verified by survival analysis and go enrichment analysis. According to the standard of FDR greater than 1, p value less than 0.05,10 579 genes were screened,7 543 genes were up-regulated and 3 036 genes were down-regulated. It was found that eight genes such as ATAD2B were closely related to the occurrence and development of breast cancer.

Key words: chi-square test, edgeR algorithm, ks test, protein interaction network, survival analysis

中图分类号: