Architecture and Code Optimization (TACO)


Search Issue
enter search term and/or author name


ACM Transactions on Architecture and Code Optimization (TACO), Volume 9 Issue 3, September 2012

Dynamically dispatching speculative threads to improve sequential execution
Yangchun Luo, Antonia Zhai
Article No.: 13
DOI: 10.1145/2355585.2355586

Efficiently utilizing multicore processors to improve their performance potentials demands extracting thread-level parallelism from the applications. Various novel and sophisticated execution models have been proposed to extract thread-level...

Extendable pattern-oriented optimization directives
Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan
Article No.: 14
DOI: 10.1145/2355585.2355587

Algorithm-specific, that is, semantic-specific optimizations have been observed to bring significant performance gains, especially for a diverse set of multi/many-core architectures. However, current programming models and compiler technologies...

Runtime energy consumption estimation for server workloads based on chaotic time-series approximation
Adam Wade Lewis, Nian-Feng Tzeng, Soumik Ghosh
Article No.: 15
DOI: 10.1145/2355585.2355588

This article proposes a runtime model that relates server energy consumption to its overall thermal envelope, using hardware performance counters and experimental measurements. While previous studies have attempted system-wide modeling of server...

Combining recency of information with selective random and a victim cache in last-level caches
Alejandro Valero, Julio Sahuquillo, Salvador Petit, Pedro López, José Duato
Article No.: 16
DOI: 10.1145/2355585.2355589

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement...

Dynamic QoS management for chip multiprocessors
Bin Li, Li-Shiuan Peh, Li Zhao, Ravi Iyer
Article No.: 17
DOI: 10.1145/2355585.2355590

With the continuing scaling of semiconductor technologies, chip multiprocessor (CMP) has become the de facto design for modern high performance computer architectures. It is expected that more and more applications with diverse requirements will...

Mixed speculative multithreaded execution models
Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra
Article No.: 18
DOI: 10.1145/2355585.2355591

The current trend toward multicore architectures has placed great pressure on programmers and compilers to generate thread-parallel programs. Improved execution performance can no longer be obtained via traditional single-thread instruction level...

Disjoint out-of-order execution processor
Mageda Sharafeddine, Komal Jothi, Haitham Akkary
Article No.: 19
DOI: 10.1145/2355585.2355592

High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple...

Static analysis of the worst-case memory performance for irregular codes with indirections
Diego Andrade, Basilio B. Fraguela, Ramón Doallo
Article No.: 20
DOI: 10.1145/2355585.2355593

Real-time systems are subject to timing constraints, whose upper bound is given by the Worst-Case Execution Time (WCET). Cache memory behavior is difficult to predict analytically and estimating a safe and precise worst-case value is even more...

Deconstructing iterative optimization
Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, Chengyong Wu
Article No.: 21
DOI: 10.1145/2355585.2355594

Iterative optimization is a popular compiler optimization approach that has been studied extensively over the past decade. In this article, we deconstruct iterative optimization by evaluating whether it works across datasets and by analyzing why...

Memory optimization of dynamic binary translators for embedded systems
Apala Guha, Kim Hazelwood, Mary Lou Soffa
Article No.: 22
DOI: 10.1145/2355585.2355595

Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. DBT-based services are valuable for all types of platforms. However, the high memory demands of DBTs present an obstacle for embedded...

A transpose-free in-place SIMD optimized FFT
James R. Geraci, Sharon M. Sacco
Article No.: 23
DOI: 10.1145/2355585.2355596

A transpose-free in-place SIMD optimized algorithm for the computation of large FFTs is introduced and implemented on the Cell Broadband Engine. Six different FFT implementations of the algorithm using six different data movement methods are...