Architecture and Code Optimization (TACO)


Search Issue
enter search term and/or author name


ACM Transactions on Architecture and Code Optimization (TACO), Volume 12 Issue 3, October 2015

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
Article No.: 26
DOI: 10.1145/2790301

Classification of data into private and shared has proven to be a catalyst for techniques to reduce coherence cost, since private data can be taken out of coherence and resources can be concentrated on providing coherence for shared data. In this...

DPCS: Dynamic Power/Capacity Scaling for SRAM Caches in the Nanoscale Era
Mark Gottscho, Abbas BanaiyanMofrad, Nikil Dutt, Alex Nicolau, Puneet Gupta
Article No.: 27
DOI: 10.1145/2792982

Fault-Tolerant Voltage-Scalable (FTVS) SRAM cache architectures are a promising approach to improve energy efficiency of memories in the presence of nanoscale process variation. Complex FTVS schemes are commonly proposed to achieve very low...

Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters
Pierre Michaud, Andrea Mondelli, André Seznec
Article No.: 28
DOI: 10.1145/2800787

During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this...

Leveraging Transactional Execution for Memory Consistency Model Emulation
Ragavendra Natarajan, Antonia Zhai
Article No.: 29
DOI: 10.1145/2786980

System emulation is widely used in today’s computer systems. This technology opens new opportunities for resource sharing as well as enhancing system security and reliability. System emulation across different instruction set architectures...

CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores
Biswabandan Panda, Shankar Balachandran
Article No.: 30
DOI: 10.1145/2806891

Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall...

Buri: Scaling Big-Memory Computing with Hardware-Based Memory Expansion
Jishen Zhao, Sheng Li, Jichuan Chang, John L. Byrne, Laura L. Ramirez, Kevin Lim, Yuan Xie, Paolo Faraboschi
Article No.: 31
DOI: 10.1145/2808233

Motivated by the challenges of scaling up memory capacity and fully exploiting the benefits of memory compression, we propose Buri, a hardware-based memory compression scheme, which simultaneously achieves cost efficiency, high performance, and...

Spatiotemporal SIMT and Scalarization for Improving GPU Efficiency
Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink
Article No.: 32
DOI: 10.1145/2811402

Temporal SIMT (TSIMT) has been suggested as an alternative to conventional (spatial) SIMT for improving GPU performance on branch-intensive code. Although TSIMT has been briefly mentioned before, it was not evaluated. We present a complete design...