enter search term and/or author name
Preventing STT-RAM Last-Level Caches from Port Obstruction
Jue Wang, Xiangyu Dong, Yuan Xie
Article No.: 23
Many new nonvolatile memory (NVM) technologies have been heavily studied to replace the power-hungry SRAM/DRAM-based memory hierarchy in today's computers. Among various emerging NVM technologies, Spin-Transfer Torque RAM (STT-RAM) has many...
Effective Transactional Memory Execution Management for Improved Concurrency
M. A. Gonzalez-Mesa, Eladio Gutierrez, Emilio L. Zapata, Oscar Plata
Article No.: 24
This article describes a transactional memory execution model intended to exploit maximum parallelism from sequential and multithreaded programs. A program code section is partitioned into chunks that will be mapped onto threads and executed...
Efficient Power Gating of SIMD Accelerators Through Dynamic Selective Devectorization in an HW/SW Codesigned Environment
Rakesh Kumar, Alejandro Martínez, Antonio González
Article No.: 25
Leakage energy is a growing concern in current and future microprocessors. Functional units of microprocessors are responsible for a major fraction of this energy. Therefore, reducing functional unit leakage has received much attention in recent...
FLARES: An Aging Aware Algorithm to Autonomously Adapt the Error Correction Capability in NAND Flash Memories
Stefano Di Carlo, Salvatore Galfano, Marco Indaco, Paolo Prinetto, Davide Bertozzi, Piero Olivo, Cristian Zambelli
Article No.: 26
With the advent of solid-state storage systems, NAND flash memories are becoming a key storage technology. However, they suffer from serious reliability and endurance issues during the operating lifetime that can be handled by the use of...
Automated Fine-Grained CPU Provisioning for Virtual Machines
Davide B. Bartolini, Filippo Sironi, Donatella Sciuto, Marco D. Santambrogio
Article No.: 27
Ideally, the pay-as-you-go model of Infrastructure as a Service (IaaS) clouds should enable users to rent just enough resources (e.g., CPU or memory bandwidth) to fulfill their service level objectives (SLOs). Achieving this goal is hard on...
Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems....
NUCA-L1: A Non-Uniform Access Latency Level-1 Cache Architecture for Multicores Operating at Near-Threshold Voltages
Farrukh Hijaz, Omer Khan
Article No.: 29
Research has shown that operating in the near-threshold region is expected to provide up to 10× energy efficiency for future processors. However, reliable operation below a minimum voltage (Vccmin) cannot be guaranteed due to process...
Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages
Andi Drebes, Karine Heydemann, Nathalie Drach, Antoniu Pop, Albert Cohen
Article No.: 30
We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory...
EFGR: An Enhanced Fine Granularity Refresh Feature for High-Performance DDR4 DRAM Devices
Venkata Kalyan Tavva, Ravi Kasha, Madhu Mutyam
Article No.: 31
High-density DRAM devices spend significant time refreshing the DRAM cells, leading to performance drop. The JEDEC DDR4 standard provides a Fine Granularity Refresh (FGR) feature to tackle refresh. Motivated by the observation that in FGR mode,...
Fault tolerance has become a fundamental concern in computer design, in addition to performance and power. Although several error detection schemes have been proposed to discover a faulty core in the system, these proposals could waste the whole...
Hardware Fault Recovery for I/O Intensive Applications
Pradeep Ramachandran, Siva Kumar Sastry Hari, Manlap Li, Sarita V. Adve
Article No.: 33
With continued process scaling, the rate of hardware failures in commodity systems is increasing. Because these commodity systems are highly sensitive to cost, traditional solutions that employ heavy redundancy to handle such failures are no...
Multiprogram Throughput Metrics: A Systematic Approach
Stijn Eyerman, Pierre Michaud, Wouter Rogiest
Article No.: 34
Running multiple programs on a processor aims at increasing the throughput of that processor. However, defining meaningful throughput metrics in a simulation environment is not as straightforward as reporting execution time. This has led to an...