参考文献

[1]
DESIKAN R, BURGER D, KECKLER S, 等. Sim-alpha: a Validated, Execution-Driven Alpha 21264 Simulator[J]. 2002.
[2]
RADIN G. The 801 minicomputer[J]. ACM SIGPLAN Notices, 1982, 17(4): 39–47.
[3]
WEAVER D L, GERMOND T. The SPARC architecture manual: version 9[M]. Englewood Cliffs: PTR Prentice-Hall, 1994.
[4]
KESSLER R E. The Alpha 21264 microprocessor[J]. IEEE Micro, 1999, 19(2): 24–36.
[5]
GRONOWSKI P E, BOWHILL W J, DONCHIN D R, 等. A 433-MHz 64-b quad-issue RISC microprocessor[J]. IEEE Journal of Solid-State Circuits, 1996, 31(11): 1687–1696.
[6]
The PowerPC architecture: a specification for a new family of RISC processors[M]. MAY C. 第2版. San Francisco: Morgan Kaufman Publishers, 1994.
[7]
YEAGER K C. The Mips R10000 superscalar microprocessor[J]. IEEE Micro, 1996, 16(2): 28–41.
[8]
ARM architecture reference manual[M]. SEAL D. Harlow: Addison-Wesley, 2006.
[9]
THORTON J. Considerations in Computer Design - Leading up to the Control Data 6600[R]. 1963.
[10]
SCHLANSKER M, RAU B R. EPIC: An Architecture for Instruction-Level Parallel Processors[R]. HPL_1999-111, HP Laboratories Palo Alto, 2000.
[11]
ARM. AMBA specifications (Rev 2.0)[J]. 1999.
[12]
AMD. HyperTransport I/O link specification revision 3.10[R]. HyperTransport Technology Consortium, 2010.
[13]
PCI-SIG. PCI Local Bus Specification Revision 2.3[R]. 2002.
[14]
PCI-SIG. PCI Express 2.0 Base Specification Revision 1.0[R]. 2006.
[15]
JEDEC. DDR2 SDRAM SPECIFICATION[R]. 2009.
[16]
ALVERSON R, CALLAHAN D, CUMMINGS D, 等. The Tera computer system[J]. ACM SIGARCH Computer Architecture News, 1990, 18(3b): 1–6.
[17]
ANDERSON T E. The performance of spin lock alternatives for shared-money multiprocessors[J]. IEEE Transactions on Parallel and Distributed Systems, 1990, 1(1): 6–16.
[18]
GRAUNKE G, THAKKAR S. Synchronization algorithms for shared-memory multiprocessors[J]. Computer, 1990, 23(6): 60–69.
[19]
YEW P-C, TZENG N-F, LAWRIE. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors[J]. IEEE Transactions on Computers, 1987, C-36(4): 388–395.
[20]
DALLY W J, TOWLES B P. Principles and Practices of Interconnection Networks[M]. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2004.
[21]
陈国良. 并行计算: 结构. 算法. 编程[M]. 高等教育出版社, 2011.
[22]
胡伟武. 共享存储系统结构[M]. 高等教育出版社, 2001.
[23]
胡伟武, 唐志敏. 龙芯1号处理器结构设计[J]. 计算机学报, 2003, 26(004): 385–396.
[24]
HU W W, ZHANG F X, LI Z S. Microarchitecture of the Godson-2 Processor[J]. 计算机科学技术学报(英文版), 2005, 20(2): 243–249.
[25]
HU W W, WANG J, GAO X, 等. Godson-3: A Scalable Multicore RISC Processor with x86 Emulation[J]. IEEE Micro, 2009, 29(2): 17–29.
[26]
HU W W, WANG R, CHEN Y J, 等. Godson-3B: A 1GHz 40W 8-core 128GFLOPS processor in 65nm CMOS[C]//IEEE International Solid-State Circuits Conference, ISSCC 2011, Digest of Technical Papers, San Francisco, CA, USA, 20-24 February, 2011. San Francisco, CA: 2011.
[27]
HU W W, ZHANG Y, YANG L, 等. Godson-3B1500: A 32nm 1.35GHz 40W 172.8GFLOPS 8-core processor[C]//Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. 2013.
[28]
HU W, YANG L, FAN B, 等. An 8-Core MIPS-Compatible Processor in 32/28 nm Bulk CMOS[J]. IEEE Journal of Solid-State Circuits, 2013, 49(1): 41–49.
[29]
吴瑞阳, 汪文祥, 王焕东, 等. 龙芯GS464E处理器核架构设计[J]. 中国科学:信息科学, 2015, 45(4): 480–500.
[30]
ROTEM E, NAVEH A, ANANTHAKRISHNAN A, 等. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge[J]. IEEE Micro, 2012, 32(2): 20–27.
[31]
MELLOR-CRUMMEY J M, SCOTT M L. Algorithms for scalable synchronization on shared-memory multiprocessors[J]. ACM Transactions on Computer Systems, 1991, 9(1): 21–65.
[32]
AGARWAL A, BIANCHINI R, CHAIKEN D, 等. The MIT Alewife machine: architecture and performance[C]//Proceedings 22nd Annual International Symposium on Computer Architecture. 1995: 2–13.
[33]
NVIDIA. NVIDIA’s Next Generation CUDA Computer Architecture[R]. 2009.
[34]
STROHMAIER E, DONGARRA J, SIMON H, 等. TOP500 list[J].
[35]
BIENIA C, KUMAR S, SINGH J P, 等. The PARSEC benchmark suite: Characterization and architectural implications[C]//2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 2008: 72–81.
[36]
BOSE P, CONTE T, AUSTIN T. Challenges in processor modeling and validation[J]. IEEE Micro, 1999, 19: 9–14.
[37]
BIRD S, PHANSALKAR A, JOHN L, 等. Performance characterization of SPEC CPU benchmarks on intel’s core microarchitecture based processor[J]. 2007.
[38]
SRINIVAS M S, SINHAROY B, EICKEMEYER R, 等. IBM POWER7 performance modeling, verification, and evaluation[J]. Journal of Reproduction and Development, 2011, 55.
[39]
ANDERSON J M, BERC L M, DEAN J, 等. Continuous profiling: where have all the cycles gone?[J]. ACM Transactions on Computer Systems, 1997, 15(4): 357–390.
[40]
MOUDGILL M, WELLMAN J-D, MORENO J H. Environment for PowerPC microarchitecture exploration[J]. IEEE Micro, 1999, 19(3): 15–25.
[41]
GILADI R, AHITAV N. SPEC as a performance evaluation measure[J]. Computer, 1995, 28(8): 33–42.
[42]
INTEL. Intel® 64 and IA-32 Architectures Software Developer’s Manual[J]. 2016.
[43]
LI K. IVY: A Shared Virtual Memory System for Parallel Computing[C]//Proceedings of the International Conference on Parallel Processing, ICPP ’88, The Pennsylvania State University, University Park, PA, USA, August 1988. Volume 2: Software. Pennsylvania State University Press, 1988: 94–101.
[44]
BINKERT N, BECKMANN B, BLACK G, 等. The gem5 simulator[J]. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7.