TY - GEN
T1 - Fast pipelined storage for high-performance energy-efficient computing with superconductor technology
AU - Dorojevets, Mikhail
AU - Chen, Zuoting
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/25
Y1 - 2015/11/25
N2 - New superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm2 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (∼9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ∼1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ∼1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.
AB - New superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm2 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (∼9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ∼1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ∼1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.
KW - energy efficiency
KW - high performance computing
KW - high-speed memory
KW - Josephson junctions
KW - superconductor processors
UR - https://www.scopus.com/pages/publications/84958817359
U2 - 10.1109/CEWIT.2015.7338159
DO - 10.1109/CEWIT.2015.7338159
M3 - Conference contribution
AN - SCOPUS:84958817359
T3 - 2015 12th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2015
BT - 2015 12th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2015
Y2 - 19 October 2015 through 20 October 2015
ER -