Paper on Hardware-Accelerated Memory Operations published at ADMS@VLDB 2017
Published on in ORCHESTRATION (RECENT ACHIEVEMENTS)
As large-scale multiprocessor systems with a non-uniform memory access (NUMA) architecture grow in complexity, such systems face network topology related problems of (1) reduced bandwidth and increased latency for remote memory accesses and (2) the strongly limited scalability of atomic memory operations (e.g., latches) as a consequence of the comprehensive cache coherence protocol. Aside from software-based approaches to cope with these issues, hardware vendors are also developing new features to efficiently address the network topology related problems. For example, the HARP ASICs in recent HPE SGI UV systems provide a proprietary API to accelerate respectively offload memory operations called GRU (Global Reference Unit).
The Figure shows a block diagram of an individual rack unit of an HPE SGI UV300. The HARPs are responsible for maintaining cache coherency and providing a common address space across CPUs. Now, the GRU provides a new functionality to asynchronously copy memory between CPUs and to accelerate atomic memory operations. In this joint paper with the Hasso Plattner group of the Hasso Plattner Institute in Potsdam, we treated an HPE SGI UV and its GRU as a playground to investigate the overall potential of hardware-accelerated main memory operations for in-memory database systems . In detail, we investigated the capabilities of the GRU to speed up database operations and removed bottlenecks in a typical in-memory database system.
 Markus Dreseler, Thomas Kissinger, Timo Djürken, Eric Lübke, Matthias Uflacker, Dirk Habich, Hasso Plattner, and Wolfgang Lehner: Hardware-Accelerated Memory Operations on Large-Scale NUMA Systems. In Proceedings of the Eighth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (co-located to VLDB), 2017