Architecture and Code Optimization (TACO)


Search Issue
enter search term and/or author name


ACM Transactions on Architecture and Code Optimization (TACO), Volume 10 Issue 3, September 2013

TACO Reviewers 2012

Article No.: 9
DOI: 10.1145/2509420.2509421

Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores
Eran Shifer, Shlomo Weiss
Article No.: 10
DOI: 10.1145/2499901

Recently, engineering solutions that include asymmetric multicores have been fabricated for low form-factor computing devices, indicating a potential direction for future evolution of processors. In this article we propose an asymmetric clustered...

Hybrid type legalization for a sparse SIMD instruction set
Yosi Ben Asher, Nadav Rotem
Article No.: 11
DOI: 10.1145/2509420.2509422

SIMD vector units implement only a subset of the operations used by vectorizing compilers, and there are multiple conflicting techniques to legalize arbitrary vector types into register-sized data types. Traditionally, type legalization is...

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions
Yuanwu Lei, Yong Dou, Lei Guo, Jinbo Xu, Jie Zhou, Yazhuo Dong, Hongjian Li
Article No.: 12
DOI: 10.1145/2512430

In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme...

Idiom recognition framework using topological embedding
Motohiro Kawahito, Hideaki Komatsu, Takao Moriyama, Hiroshi Inoue, Toshio Nakatani
Article No.: 13
DOI: 10.1145/2512431

Modern processors support hardware-assist instructions (such as TRT and TROT instructions on the IBM System z) to accelerate certain functions such as delimiter search and character conversion. Such special instructions are often used in...

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach
Ghassan Shobaki, Maxim Shawabkeh, Najm Eldeen Abu Rmaileh
Article No.: 14
DOI: 10.1145/2512432

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic...

An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA
Dongrui She, Yifan He, Henk Corporaal
Article No.: 15
DOI: 10.1145/2509420.2509426

In application-specific processor design, a common approach to improve performance and efficiency is to use special instructions that execute complex operation patterns. However, in a generic embedded processor with compact Instruction Set...

Improved bitwidth-aware variable packing
V. Krishna Nandivada, Rajkishore Barik
Article No.: 16
DOI: 10.1145/2509420.2509427

Bitwidth-aware register allocation has caught the attention of researchers aiming to effectively reduce the number of variables spilled into memory. For general-purpose processors, this improves the execution time performance and reduces runtime...

Scalable high-radix router microarchitecture using a network switch organization
Jung Ho Ahn, Young Hoon Son, John Kim
Article No.: 17
DOI: 10.1145/2512433

As the system size of supercomputers and datacenters increases, cost-efficient networks become critical in achieving good scalability on those systems. High-radix routers reduce network cost by lowering the network diameter while providing...

Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
Libo Huang, Zhiying Wang, Nong Xiao, Yongwen Wang, Qiang Dou
Article No.: 18
DOI: 10.1145/2512434

Multicore designs have emerged as the dominant organization for future high-performance microprocessors. Communication in such designs is often enabled by Networks-on-Chip (NoCs). A new trend in such architectures is to fit a Message Passing...

Orchestrating stream graphs using model checking
Avinash Malik, David Gregg
Article No.: 19
DOI: 10.1145/2512435

In this article we use model checking to statically distribute and schedule Synchronous DataFlow (SDF) graphs on heterogeneous execution architectures. We show that model checking is capable of providing an optimal solution and it...

Using machine learning to partition streaming programs
Zheng Wang, Michael F. P. O'boyle
Article No.: 20
DOI: 10.1145/2512436

Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture....

Designing on-chip networks for throughput accelerators
Ali Bakhoda, John Kim, Tor M. Aamodt
Article No.: 21
DOI: 10.1145/2512429

As the number of cores and threads in throughput accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This article explores throughput-effective Network-on-Chips (NoC)...