enter search term and/or author name
TACO Reviewers 2012
Article No.: 9
Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores
Eran Shifer, Shlomo Weiss
Article No.: 10
Recently, engineering solutions that include asymmetric multicores have been fabricated for low form-factor computing devices, indicating a potential direction for future evolution of processors. In this article we propose an asymmetric clustered...
Hybrid type legalization for a sparse SIMD instruction set
Yosi Ben Asher, Nadav Rotem
Article No.: 11
SIMD vector units implement only a subset of the operations used by vectorizing compilers, and there are multiple conflicting techniques to legalize arbitrary vector types into register-sized data types. Traditionally, type legalization is...
In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme...
Modern processors support hardware-assist instructions (such as TRT and TROT instructions on the IBM System z) to accelerate certain functions such as delimiter search and character conversion. Such special instructions are often used in...
Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach
Ghassan Shobaki, Maxim Shawabkeh, Najm Eldeen Abu Rmaileh
Article No.: 14
Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic...
An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA
Dongrui She, Yifan He, Henk Corporaal
Article No.: 15
In application-specific processor design, a common approach to improve performance and efficiency is to use special instructions that execute complex operation patterns. However, in a generic embedded processor with compact Instruction Set...
Bitwidth-aware register allocation has caught the attention of researchers aiming to effectively reduce the number of variables spilled into memory. For general-purpose processors, this improves the execution time performance and reduces runtime...
Scalable high-radix router microarchitecture using a network switch organization
Jung Ho Ahn, Young Hoon Son, John Kim
Article No.: 17
As the system size of supercomputers and datacenters increases, cost-efficient networks become critical in achieving good scalability on those systems. High-radix routers reduce network cost by lowering the network diameter while providing...
Multicore designs have emerged as the dominant organization for future high-performance microprocessors. Communication in such designs is often enabled by Networks-on-Chip (NoCs). A new trend in such architectures is to fit a Message Passing...
In this article we use model checking to statically distribute and schedule Synchronous DataFlow (SDF) graphs on heterogeneous execution architectures. We show that model checking is capable of providing an optimal solution and it...
Using machine learning to partition streaming programs
Zheng Wang, Michael F. P. O'boyle
Article No.: 20
Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture....
Designing on-chip networks for throughput accelerators
Ali Bakhoda, John Kim, Tor M. Aamodt
Article No.: 21
As the number of cores and threads in throughput accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This article explores throughput-effective Network-on-Chips (NoC)...