TY - GEN
T1 - Proactive instruction fetch
AU - Ferdman, Michael
AU - Kaynak, Cansu
AU - Falsafi, Babak
PY - 2011
Y1 - 2011
N2 - Fast access requirements preclude building L1 instruction caches large enough to capture the working set of server workloads. Efforts exist to mitigate limited L1 instruction cache capacity by relying on the stability and repetitiveness of the instruction stream to predict and prefetch future instruction blocks prior to their use. However, dynamic variation in cache miss sequences prevents correct and timely prediction, leaving many instruction-fetch stalls exposed, resulting in a key performance bottleneck for servers. We observe that, while the vast majority of application instruction references are amenable to prediction, even minor control-flow variations are amplified by microarchitectural components, resulting in a major source of instability and randomness that significantly limit prefetcher utility. Control-flow variation disturbs the L1 instruction cache replacement order and branch predictor state, causing the L1 instruction cache to randomly filter the instruction stream while the branch predictor and spontaneous hardware interrupts inject the stream with unpredictable noise. Based on this observation, we show that an instruction prefetcher, previously plagued by microarchitectural instability, becomes nearly perfect when modified to operate on the correct-path, retire-order instruction stream. We propose Proactive Instruction Fetch, an instruction prefetch mechanism that achieves higher than 99.5% instruction-cache hit rate, improving server throughput by 27% and nearly matching the performance of a perfect L1 instruction cache that never misses.
AB - Fast access requirements preclude building L1 instruction caches large enough to capture the working set of server workloads. Efforts exist to mitigate limited L1 instruction cache capacity by relying on the stability and repetitiveness of the instruction stream to predict and prefetch future instruction blocks prior to their use. However, dynamic variation in cache miss sequences prevents correct and timely prediction, leaving many instruction-fetch stalls exposed, resulting in a key performance bottleneck for servers. We observe that, while the vast majority of application instruction references are amenable to prediction, even minor control-flow variations are amplified by microarchitectural components, resulting in a major source of instability and randomness that significantly limit prefetcher utility. Control-flow variation disturbs the L1 instruction cache replacement order and branch predictor state, causing the L1 instruction cache to randomly filter the instruction stream while the branch predictor and spontaneous hardware interrupts inject the stream with unpredictable noise. Based on this observation, we show that an instruction prefetcher, previously plagued by microarchitectural instability, becomes nearly perfect when modified to operate on the correct-path, retire-order instruction stream. We propose Proactive Instruction Fetch, an instruction prefetch mechanism that achieves higher than 99.5% instruction-cache hit rate, improving server throughput by 27% and nearly matching the performance of a perfect L1 instruction cache that never misses.
KW - branch prediction
KW - caching
KW - instruction streaming
KW - prefetching
UR - https://www.scopus.com/pages/publications/84858759165
U2 - 10.1145/2155620.2155638
DO - 10.1145/2155620.2155638
M3 - Conference contribution
AN - SCOPUS:84858759165
SN - 9781450310536
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 152
EP - 162
BT - MICRO 44 - Proceedings of the 44th Annual IEEE/ACM Symposium on Microarchitecture
T2 - 44th Annual IEEE/ACM Symposium on Microarchitecture, MICRO 44
Y2 - 4 December 2011 through 7 December 2011
ER -