enter search term and/or author name
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
Article No.: 26
Classification of data into private and shared has proven to be a catalyst for techniques to reduce coherence cost, since private data can be taken out of coherence and resources can be concentrated on providing coherence for shared data. In this...
Fault-Tolerant Voltage-Scalable (FTVS) SRAM cache architectures are a promising approach to improve energy efficiency of memories in the presence of nanoscale process variation. Complex FTVS schemes are commonly proposed to achieve very low...
Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters
Pierre Michaud, Andrea Mondelli, André Seznec
Article No.: 28
During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this...
Leveraging Transactional Execution for Memory Consistency Model Emulation
Ragavendra Natarajan, Antonia Zhai
Article No.: 29
System emulation is widely used in today’s computer systems. This technology opens new opportunities for resource sharing as well as enhancing system security and reliability. System emulation across different instruction set architectures...
CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores
Biswabandan Panda, Shankar Balachandran
Article No.: 30
Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall...
Buri: Scaling Big-Memory Computing with Hardware-Based Memory Expansion
Jishen Zhao, Sheng Li, Jichuan Chang, John L. Byrne, Laura L. Ramirez, Kevin Lim, Yuan Xie, Paolo Faraboschi
Article No.: 31
Motivated by the challenges of scaling up memory capacity and fully exploiting the benefits of memory compression, we propose Buri, a hardware-based memory compression scheme, which simultaneously achieves cost efficiency, high performance, and...
Spatiotemporal SIMT and Scalarization for Improving GPU Efficiency
Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink
Article No.: 32
Temporal SIMT (TSIMT) has been suggested as an alternative to conventional (spatial) SIMT for improving GPU performance on branch-intensive code. Although TSIMT has been briefly mentioned before, it was not evaluated. We present a complete design...