Architecture and Code Optimization (TACO)


Search Issue
enter search term and/or author name


ACM Transactions on Architecture and Code Optimization (TACO), Volume 13 Issue 1, April 2016

Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Konstantinos Koukos, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
Article No.: 1
DOI: 10.1145/2889488

This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices....

Dynamic Memory Balancing for Virtualization
Zhigang Wang, Xiaolin Wang, Fang Hou, Yingwei Luo, Zhenlin Wang
Article No.: 2
DOI: 10.1145/2851501

Allocating memory dynamically for virtual machines (VMs) according to their demands provides significant benefits as well as great challenges. Efficient memory resource management requires knowledge of the memory demands of applications or systems...

Hardware Performance Counter-Based Malware Identification and Detection with Adaptive Compressive Sensing
Xueyang Wang, Sek Chai, Michael Isnardi, Sehoon Lim, Ramesh Karri
Article No.: 3
DOI: 10.1145/2857055

Hardware Performance Counter-based (HPC) runtime checking is an effective way to identify malicious behaviors of malware and detect malicious modifications to a legitimate program’s control flow. To reduce the overhead in the monitored...

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors
Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout
Article No.: 4
DOI: 10.1145/2875424

While hardware is evolving toward heterogeneous multicore architectures, modern software applications are increasingly written in managed languages. Heterogeneity was born of a need to improve energy efficiency; however, we want the performance of...

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
Buse Yilmaz, Bariş Aktemur, MaríA J. Garzarán, Sam Kamin, Furkan Kiraç
Article No.: 5
DOI: 10.1145/2851500

Runtime specialization is used for optimizing programs based on partial information available only at runtime. In this paper we apply autotuning on runtime specialization of Sparse Matrix-Vector Multiplication to predict a best specialization...

Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations
Mingzhou Zhou, Bo Wu, Xipeng Shen, Yaoqing Gao, Graham Yiu
Article No.: 6
DOI: 10.1145/2851502

Feedback-driven optimization (FDO) is an important component in mainstream compilers. By allowing the compiler to reoptimize the program based on some profiles of the program's dynamic behaviors, it often enhances the quality of the generated code...

Optimizing Indirect Branches in Dynamic Binary Translators
Amanieu d'Antras, Cosmin Gorgovan, Jim Garside, Mikel Luján
Article No.: 7
DOI: 10.1145/2866573

Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect...

Clustering-Based Selection for the Exploration of Compiler Optimization Sequences
Luiz G. A. Martins, Ricardo Nobre, João M. P. Cardoso, Alexandre C. B. Delbem, Eduardo Marques
Article No.: 8
DOI: 10.1145/2883614

A large number of compiler optimizations are nowadays available to users. These optimizations interact with each other and with the input code in several and complex ways. The sequence of application of optimization passes can have a significant...

Power Efficient Hardware Transactional Memory: Dynamic Issue of Transactions
Sang Wook Stephen Do, Michel Dubois
Article No.: 9
DOI: 10.1145/2875425

Transactional Memory (TM) is no longer just an academic interest as industry has started to adopt the idea in its commercial products. In this paper, we propose Dynamic Transaction Issue (DTI), a new scheme that can be easily implemented on top of...

Understanding and Mitigating Covert Channels Through Branch Predictors
Dmitry Evtyushkin, Dmitry Ponomarev, Nael Abu-Ghazaleh
Article No.: 10
DOI: 10.1145/2870636

Covert channels through shared processor resources provide secret communication between two malicious processes: the trojan and the spy. In this article, we classify, analyze, and compare covert channels through dynamic branch prediction units in...

A Compiler Approach for Exploiting Partial SIMD Parallelism
Hao Zhou, Jingling Xue
Article No.: 11
DOI: 10.1145/2886101

Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be...

R-GPU: A Reconfigurable GPU Architecture
Gert-Jan Van Den Braak, Henk Corporaal
Article No.: 12
DOI: 10.1145/2890506

Over the last decade, Graphics Processing Unit (GPU) architectures have evolved from a fixed-function graphics pipeline to a programmable, energy-efficient compute accelerator for massively parallel applications. The compute power arises from the...

Thread-Aware Adaptive Prefetcher on Multicore Systems: Improving the Performance for Multithreaded Workloads
Peng Liu, Jiyang Yu, Michael C. Huang
Article No.: 13
DOI: 10.1145/2890505

Most processors employ hardware data prefetching techniques to hide memory access latencies. However, the prefetching requests from different threads on a multicore processor can cause severe interference with prefetching and/or demand requests of...

MAMBO: A Low-Overhead Dynamic Binary Modification Tool for ARM
Cosmin Gorgovan, Amanieu d'Antras, Mikel Luján
Article No.: 14
DOI: 10.1145/2896451

As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM...