ACM Transactions on Architecture and Code Optimization (TACO), Volume 7 Issue 3, December 2010

PiPA: Pipelined profiling and analysis on multicore systems
Qin Zhao, Ioana Cutcutache, Weng-Fai Wong
Article No.: 13
DOI: 10.1145/1880037.1880038

Profiling and online analysis are important tasks in program understanding and feedback-directed optimization. However, fine-grained profiling and online analysis tend to seriously slow down the application. To cope with the slowdown, one may have...

Quality of service shared cache management in chip multiprocessor architecture
Fei Guo, Yan Solihin, Li Zhao, Ravishankar Iyer
Article No.: 14
DOI: 10.1145/1880037.1880039

The trends in enterprise IT toward service-oriented computing, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly diverse in terms of performance, reliability, and availability requirements....

Design exploration of hybrid caches with disparate memory technologies
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, Yuan Xie
Article No.: 15
DOI: 10.1145/1880037.1880040

Traditional multilevel SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core--to--cache balance, power consumption, and design complexity. New advancements in...

Exploiting compression opportunities to improve SpMxV performance on shared memory systems
Kornilios Kourtis, Georgios Goumas, Nectarios Koziris
Article No.: 16
DOI: 10.1145/1880037.1880041

The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared memory systems, due to the streaming nature of its data access pattern. To decrease memory contention and improve kernel performance we propose two compression...