[1] 魏少军, 刘雷波, 尹首一. 可重构计算处理器技术[J]. 中国科学:信息科学, 2012(12):1559-1576. Wei Shaojun, Liu Leibo, Yin Shouyi. Key techniques of reconfigurable computing processor[J]. SCIENCE CHINA:Information Sciences, 2012(12):1559-1576.
[2] 李浩, 谢伦国. 片上多处理器末级Cache优化技术研究[J]. 计算机研究与发展, 2012, 49(S1):172-179. Li Hao, Xie Lunguo. Research development of optimization technology on last level cache in chip multi-processors[J]. Journal of Computer Research and Development, 2012, 49(S1):172-179.
[3] 石嵩, 李宏亮, 朱巍. 阵列众核处理器上的高效归并排序算法[J]. 计算机研究与发展, 2016, 53(2):362-373. Shi Song, Li Hongliang, Zhu Wei. Efficient merge sort algorithms on array-based manycore architectures[J]. Journal of Compute Research and Development, 2016, 53(2):362-373.
[4] Berezecki M, Frachtenberg E, Paleczny M, et al. Power and performance evaluation of memcached on the TILEPro64 architecture[J]. Sustainable Computing Informatics & Systems, 2012, 2(2):81-90.
[5] Hu Ziang, Cuvillo J D, Zhu Weirong, et al. Optimization of dense matrix multiplication on IBM cyclops-64:challenges and experiences[C]//Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference. Dresden:[s.n.], 2006:134-144.
[6] 胡向东, 杨剑新, 朱英. 高性能多核处理器申威1600[J]. 中国科学:信息科学, 2015(4):513-522. Hu Xiangdong, Yang Jianxin, Zhu Ying. Shenwei-1600:a high-performance multi-core microprocessor[J]. SCIENCE CHINA:Information Sciences, 2015(4):513-522.
[7] 郑方, 许勇, 李宏亮, 等. 一种面向高性能计算的自主众核处理器结构[J]. 中国科学:信息科学, 2015(4):523-534. Zheng Fang, Xu Yong, Li Hongliang, et al. A homegrown many-core processor architecture for high-performance computing[J]. SCIENCE CHINA:Information Sciences, 2015(4):523-534.
[8] Banakar R, Steinke S, Lee B S, et al. Scratchpad memory:design alternative for cache on-chip memory in embedded systems[C]//Tenth International Symposium on Hardware/Software Codesign. Piscataway:IEEE, 2002:73-78.
[9] 朱小虎, 曹阳, 王力纬. 多级拥塞控制的NOC路由算法[J]. 北京邮电大学学报, 2007, 30(5):91-94. Zhu Xiaohu, Cao Yang, Wang Liwei. A multilevel congestion control routing algorithm for network-on-chip[J]. Journal of Beijing University of Posts and Telecommunications, 2007, 30(5):91-94.
[10] Mullins R, West A, Moore S. The design and implementation of a low-latency on-chip network[C]//2006 Asia and South Pacific Conference on Design Automation. Piscataway:IEEE, 2006:164-169.
[11] Loi I, Benini L. An efficient distributed memory interface for many-core platform with 3D stacked DRAM[C]//2010 Design, Automation & Test in Europe Conference & Exhibition (DATE). Piscataway:IEEE, 2010:99-104. |