ACM Transactions on Architecture and Code Optimization (TACO), Volume 10 Issue 1, April 2013

Deterministic Replay Using Global Clock
Yunji Chen, Tianshi Chen, Ling Li, Ruiyang Wu, Daofu Liu, Weiwu Hu
Article No.: 1
DOI: 10.1145/2445572.2445573

Debugging parallel programs is a well-known difficult problem. A promising method to facilitate debugging parallel programs is using hardware support to achieve deterministic replay on a Chip Multi-Processor (CMP). As a Design-For-Debug (DFD)...

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Daniel Lustig, Abhishek Bhattacharjee, Margaret Martonosi
Article No.: 2
DOI: 10.1145/2445572.2445574

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and...

Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
Rong Chen, Haibo Chen
Article No.: 3
DOI: 10.1145/2445572.2445575

The prevalence of chip multiprocessors opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. MapReduce, a simple and elegant programming model to program large-scale clusters, has...

A-DFA: A Time- and Space-Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation
Michela Becchi, Patrick Crowley
Article No.: 4
DOI: 10.1145/2445572.2445576

Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While Deterministic Finite Automata (DFAs) allow this operation to be...

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, Norman P. Jouppi
Article No.: 5
DOI: 10.1145/2445572.2445577

This article introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At...