Architecture and Code Optimization (TACO)


Search Issue
enter search term and/or author name


ACM Transactions on Architecture and Code Optimization (TACO), Volume 14 Issue 2, July 2017

Dirty-Block Tracking in a Direct-Mapped DRAM Cache with Self-Balancing Dispatch
Dongwoo Lee, Sangheon Lee, Soojung Ryu, Kiyoung Choi
Article No.: 11
DOI: 10.1145/3068460

Recently, processors have begun integrating 3D stacked DRAMs with the cores on the same package, and there have been several approaches to effectively utilizing the on-package DRAMs as caches. This article presents an approach that combines the...

Significance-Aware Program Execution on Unreliable Hardware
Konstantinos Parasyris, Vassilis Vassiliadis, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas
Article No.: 12
DOI: 10.1145/3058980

This article introduces a significance-centric programming model and runtime support that sets the supply voltage in a multicore CPU to sub-nominal values to reduce the energy footprint and provide mechanisms to control output quality. The...

DawnCC: Automatic Annotation for Data Parallelism and Offloading
Gleison Mendonça, Breno Guimarães, Péricles Alves, Márcio Pereira, Guido Araújo, Fernando Magno Quintão Pereira
Article No.: 13
DOI: 10.1145/3084540

Directive-based programming models, such as OpenACC and OpenMP, allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone...

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories
Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, Vaishnav Srinivas
Article No.: 14
DOI: 10.1145/3085572

Historically, server designers have opted for simple memory systems by picking one of a few commoditized DDR memory products. We are already witnessing a major upheaval in the off-chip memory hierarchy, with the introduction of many new memory...

Scratchpad Sharing in GPUs
Vishwesh Jatala, Jayvant Anantpur, Amey Karkare
Article No.: 15
DOI: 10.1145/3075619

General-Purpose Graphics Processing Unit (GPGPU) applications exploit on-chip scratchpad memory available in the Graphics Processing Units (GPUs) to improve performance. The amount of thread level parallelism (TLP) present in the GPU is limited by...

Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous Architectures
Tae Jun Ham, Juan L. Aragón, Margaret Martonosi
Article No.: 16
DOI: 10.1145/3075620

In today’s computers, heterogeneous processing is used to meet performance targets at manageable power. In adopting increased compute specialization, however, the relative amount of time spent on communication increases. System and software...

An Integrated Vector-Scalar Design on an In-Order ARM Core
Milan Stanic, Oscar Palomar, Timothy Hayes, Ivan Ratkovic, Adrian Cristal, Osman Unsal, Mateo Valero
Article No.: 17
DOI: 10.1145/3075618

In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase...

On the Interactions Between Value Prediction and Compiler Optimizations in the Context of EOLE
Fernando A. Endo, Arthur Perais, André Seznec
Article No.: 18
DOI: 10.1145/3090634

Increasing instruction-level parallelism is regaining attractiveness within the microprocessor industry.

The {Early | Out-of-order | Late} Execution (EOLE) microarchitecture and Differential Value TAgged GEometric...

Band-Pass Prefetching: An Effective Prefetch Management Mechanism Using Prefetch-Fraction Metric in Multi-Core Systems
Aswinkumar Sridharan, Biswabandan Panda, Andre Seznec
Article No.: 19
DOI: 10.1145/3090635

In multi-core systems, an application’s prefetcher can interfere with the memory requests of other applications using the shared resources, such as last level cache and memory bandwidth. In order to minimize prefetcher-caused interference,...

Symmetry in Software Synthesis
Andrés Goens, Sergio Siccha, Jeronimo Castrillon
Article No.: 20
DOI: 10.1145/3095747

With the surge of multi- and many-core systems, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly...