enter search term and/or author name
A Joint SW/HW Approach for Reducing Register File Vulnerability
Hamed Tabkhi, Gunar Schirner
Article No.: 9
The Register File (RF) is a particularly vulnerable component within processor core and at the same time a hotspot with high power density. To reduce RF vulnerability, conventional HW-only approaches such as Error Correction Codes (ECCs) or...
Reliable Integrity Checking in Multicore Processors
Arun Kanuparthi, Ramesh Karri
Article No.: 10
Security and reliability have become important concerns in the design of computer systems. On one hand, microarchitectural enhancements for security (such as for dynamic integrity checking of code at runtime) have been proposed. On the other hand,...
Current high-performance computer systems utilize a memory hierarchy of on-chip cache, main memory, and secondary storage due to differences in device characteristics. Limiting the amount of main memory causes page swap operations and duplicates...
Dynamic Shared SPM Reuse for Real-Time Multicore Embedded Systems
Morteza Mohajjel Kafshdooz, Alireza Ejlali
Article No.: 12
Allocating the scratchpad memory (SPM) space to tasks is a challenging problem in real-time multicore embedded systems that use shared SPM. Proper SPM space allocation is important, as it considerably influences the application worst-case...
GPU Performance and Power Tuning Using Regression Trees
Wenhao Jia, Elba Garza, Kelly A. Shaw, Margaret Martonosi
Article No.: 13
GPU performance and power tuning is difficult, requiring extensive user expertise and time-consuming trial and error. To accelerate design tuning, statistical design space exploration methods have been proposed. This article presents Starchart, a...
An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations
Irshad Pananilath, Aravind Acharya, Vinay Vasista, Uday Bondhugula
Article No.: 14
The Lattice-Boltzmann method (LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver...
Iterative optimization is a simple but powerful approach that searches the best possible combination of compiler optimizations for a given workload. However, iterative optimization is plagued by several practical issues that prevent it from being...
A modern general-purpose graphics processing unit (GPGPU) usually consists of multiple streaming multiprocessors (SMs), each having a pipeline that incorporates a group of threads executing a common instruction flow. Although SMs are designed to...
EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors
Hsiang-Yun Cheng, Matt Poremba, Narges Shahidi, Ivan Stalev, Mary Jane Irwin, Mahmut Kandemir, Jack Sampson, Yuan Xie
Article No.: 17
Power management for large last-level caches (LLCs) is important in chip multiprocessors (CMPs), as the leakage power of LLCs accounts for a significant fraction of the limited on-chip power budget. Since not all workloads running on CMPs need the...
Intercepting Functions for Memoization: A Case Study Using Transcendental Functions
Arjun Suresh, Bharath Narasimha Swamy, Erven Rohou, André Seznec
Article No.: 18
Memoization is the technique of saving the results of executions so that future executions can be omitted when the input set repeats. Memoization has been proposed in previous literature at the instruction, basic block, and function levels using...
SECRET: A Selective Error Correction Framework for Refresh Energy Reduction in DRAMs
Chung-Hsiang Lin, De-Yu Shen, Yi-Jung Chen, Chia-Lin Yang, Cheng-Yuan Michael Wang
Article No.: 19
DRAMs are used as the main memory in most computing systems today. Studies show that DRAMs contribute to a significant part of overall system power consumption. One of the main challenges in low-power DRAM design is the inevitable refresh process....
When building a compiler for a high-level language, certain intrinsic features of the language must be expressed in terms of the resulting low-level operations. Complex features are often expressed by explicitly weaving together bits of low-level...
Enabling GPGPU Low-Level Hardware Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU
Raghuraman Balasubramanian, Vinay Gangadhar, Ziliang Guo, Chen-Han Ho, Cherin Joseph, Jaikrishnan Menon, Mario Paulo Drumond, Robin Paul, Sharath Prasad, Pradip Valathol, Karthikeyan Sankaralingam
Article No.: 21
Graphic processing unit (GPU)-based general-purpose computing is developing as a viable alternative to CPU-based computing in many domains. Today’s tools for GPU analysis include simulators like GPGPU-Sim, Multi2Sim, and Barra. While useful...
Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures
Quan Chen, Minyi Guo
Article No.: 22
Modern mainstream powerful computers adopt multisocket multicore CPU architecture and NUMA-based memory architecture. While traditional work-stealing schedulers are designed for single-socket architectures, they incur severe shared cache misses...
Section-Based Program Analysis to Reduce Overhead of Detecting Unsynchronized Thread Communication
Madan Das, Gabriel Southern, Jose Renau
Article No.: 23
Most systems that test and verify parallel programs, such as deterministic execution engines, data race detectors, and software transactional memory systems, require instrumenting loads and stores in an application. This can cause a very...
General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI)...
Contech: Efficiently Generating Dynamic Task Graphs for Arbitrary Parallel Programs
Brian P. Railing, Eric R. Hein, Thomas M. Conte
Article No.: 25
Parallel programs can be characterized by task graphs encoding instructions, memory accesses, and the parallel work’s dependencies, while representing any threading library and architecture. This article presents Contech, a high performance...