enter search term and/or author name
Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer...
Programmers can no longer depend on new processors to have significantly improved single-thread performance. Instead, gains have to come from other sources such as the compiler and its optimization passes. Advanced passes make use of information...
The polyhedron model is a powerful model to identify and apply systematically loop transformations that improve data locality (e.g., via tiling) and enable parallelization. In the polyhedron model, a loop transformation is, essentially,...
HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache
Wei Wei, Dejun Jiang, Jin Xiong, Mingyu Chen
Article No.: 24
Data-center servers benefit from large-capacity memory systems to run multiple processes simultaneously. Hybrid DRAM-NVM memory is attractive for increasing memory capacity by exploiting the scalability of Non-Volatile Memory (NVM). However,...
Providing Predictable Performance via a Slowdown Estimation Model
Dongliang Xiong, Kai Huang, Xiaowen Jiang, Xiaolang Yan
Article No.: 25
Interapplication interference at shared main memory slows down different applications differently. A few slowdown estimation models have been proposed to provide predictable performance by quantifying memory interference, but they have relatively...
Programming Heterogeneous Systems from an Image Processing DSL
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, Mark Horowitz
Article No.: 26
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating,...
Efficient Generation of Compact Execution Traces for Multicore Architectural Simulations
Ayman Hroub, M. E. S. Elrabaa, M. F. Mudawar, A. Khayyat
Article No.: 27
Requiring no functional simulation, trace-driven simulation has the potential of achieving faster simulation speeds than execution-driven simulation of multicore architectures. An efficient, on-the-fly, high-fidelity trace generation method for...
Optimal code performance is (besides correctness and accuracy) the most important objective in compute intensive applications. In many of these applications, Graphic Processing Units (GPUs) are used because of their high amount of compute power....
MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning
Amir H. Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, John Cavazos
Article No.: 29
Recent compilers offer a vast number of multilayered optimizations targeting different code segments of an application. Choosing among these optimizations can significantly impact the performance of the code being optimized. The selection of the...
To increase the performance of data-intensive applications, we present an extension to a CPU architecture that enables arbitrary near-data processing capabilities close to the main memory. This is realized by introducing a component attached to...
SWITCHES: A Lightweight Runtime for Dataflow Execution of Tasks on Many-Cores
Andreas Diavastos, Pedro Trancoso
Article No.: 31
SWITCHES is a task-based dataflow runtime that implements a lightweight distributed triggering system for runtime dependence resolution and uses static scheduling and compile-time assignment policies to reduce runtime overheads. Unlike other...