enter search term and/or author name
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Konstantinos Koukos, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
Article No.: 1
This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices....
Allocating memory dynamically for virtual machines (VMs) according to their demands provides significant benefits as well as great challenges. Efficient memory resource management requires knowledge of the memory demands of applications or systems...
Hardware Performance Counter-Based Malware Identification and Detection with Adaptive Compressive Sensing
Xueyang Wang, Sek Chai, Michael Isnardi, Sehoon Lim, Ramesh Karri
Article No.: 3
Hardware Performance Counter-based (HPC) runtime checking is an effective way to identify malicious behaviors of malware and detect malicious modifications to a legitimate program’s control flow. To reduce the overhead in the monitored...
Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors
Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout
Article No.: 4
While hardware is evolving toward heterogeneous multicore architectures, modern software applications are increasingly written in managed languages. Heterogeneity was born of a need to improve energy efficiency; however, we want the performance of...
Runtime specialization is used for optimizing programs based on partial information available only at runtime. In this paper we apply autotuning on runtime specialization of Sparse Matrix-Vector Multiplication to predict a best specialization...
Feedback-driven optimization (FDO) is an important component in mainstream compilers. By allowing the compiler to reoptimize the program based on some profiles of the program's dynamic behaviors, it often enhances the quality of the generated code...
Optimizing Indirect Branches in Dynamic Binary Translators
Amanieu d'Antras, Cosmin Gorgovan, Jim Garside, Mikel Luján
Article No.: 7
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect...
Clustering-Based Selection for the Exploration of Compiler Optimization Sequences
Luiz G. A. Martins, Ricardo Nobre, João M. P. Cardoso, Alexandre C. B. Delbem, Eduardo Marques
Article No.: 8
A large number of compiler optimizations are nowadays available to users. These optimizations interact with each other and with the input code in several and complex ways. The sequence of application of optimization passes can have a significant...
Power Efficient Hardware Transactional Memory: Dynamic Issue of Transactions
Sang Wook Stephen Do, Michel Dubois
Article No.: 9
Transactional Memory (TM) is no longer just an academic interest as industry has started to adopt the idea in its commercial products. In this paper, we propose Dynamic Transaction Issue (DTI), a new scheme that can be easily implemented on top of...
Understanding and Mitigating Covert Channels Through Branch Predictors
Dmitry Evtyushkin, Dmitry Ponomarev, Nael Abu-Ghazaleh
Article No.: 10
Covert channels through shared processor resources provide secret communication between two malicious processes: the trojan and the spy. In this article, we classify, analyze, and compare covert channels through dynamic branch prediction units in...
A Compiler Approach for Exploiting Partial SIMD Parallelism
Hao Zhou, Jingling Xue
Article No.: 11
Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be...
Over the last decade, Graphics Processing Unit (GPU) architectures have evolved from a fixed-function graphics pipeline to a programmable, energy-efficient compute accelerator for massively parallel applications. The compute power arises from the...
Thread-Aware Adaptive Prefetcher on Multicore Systems: Improving the Performance for Multithreaded Workloads
Peng Liu, Jiyang Yu, Michael C. Huang
Article No.: 13
Most processors employ hardware data prefetching techniques to hide memory access latencies. However, the prefetching requests from different threads on a multicore processor can cause severe interference with prefetching and/or demand requests of...
MAMBO: A Low-Overhead Dynamic Binary Modification Tool for ARM
Cosmin Gorgovan, Amanieu d'Antras, Mikel Luján
Article No.: 14
As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM...