Recent Achievements

Paper on Hardware-Accelerated Memory Operations published at ADMS@VLDB 2017


As large-scale multiprocessor systems with a non-uniform memory access (NUMA) architecture grow in complexity, such systems face network topology related problems of (1) reduced bandwidth and increased latency for remote memory accesses and (2) the strongly limited scalability of atomic memory operations (e.g., latches) as a consequence of the comprehensive cache coherence protocol.  Aside from software-based approaches to cope with these issues, hardware vendors are also developing new features to efficiently address the network topology related problems. For example, the HARP ASICs in recent HPE SGI UV systems provide a proprietary API to accelerate respectively offload memory operations called GRU (Global Reference Unit).

The Figure shows a block diagram of an individual rack unit of an HPE SGI UV300. The HARPs are responsible for maintaining cache coherency and providing a common address space across CPUs. Now, the GRU provides a new functionality to asynchronously copy memory between CPUs and to accelerate atomic memory operations. In this joint paper with the Hasso Plattner group of the Hasso Plattner Institute in Potsdam, we treated an HPE SGI UV and its GRU as a playground to investigate the overall potential of hardware-accelerated main memory operations for in-memory database systems [1]. In detail, we investigated the capabilities of the GRU to speed up database operations and removed bottlenecks in a typical in-memory database system.

[1] Markus Dreseler, Thomas Kissinger, Timo Djürken, Eric Lübke, Matthias Uflacker, Dirk Habich, Hasso Plattner, and Wolfgang Lehner: Hardware-Accelerated Memory Operations on Large-Scale NUMA Systems. In Proceedings of the Eighth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (co-located to VLDB), 2017

Orchestration overview article accepted at IEEE TMSCS


The article “A Hardware/Software Stack for Heterogeneous Systems” has been accepted at IEEE Transactions on Multi-Scale Computing Systems [doi]. With many PhD projects of Orchestration involved, the article summarizes our joint research for a hardware/software stack for future heterogeneous computer systems. This includes tiled heterogeneous multicores, capability-based operating systems, adaptive application runtimes, dataflow programming models, and probabilistic model checking.

IEEE Rebooting Computing Week 2017 - Presentation by Prof. Jeronimo Castrillon


Last week, the IEEE Rebooting Computing 2017 Industry Summit on the Future of Computing was held in Washington, DC. The cfaed Strategic Professor for Compiler Construction Jeroimo Castrillon was invited for a talk within the Innovation and Ideas Panel. See his presentation as well as the talks of Bing Liu (University of Illinois at Chicago), Robert Voigt (Northrop Grumman Corporation), and Dario Gil (IBM Research) which are followed by a panel discussion.
cfaed supported "Rebooting Computing Week 2017" as a Silver Patron.

Best Paper Award: Norman Rink, Automotive 2017



Norman Rink, from the Chair for Compiler Construction, received the best paper award at the Automotive 2017 for his paper entitled “Extending a Compiler Backend for Complete Memory Error Detection”. The conference took place in Stuttgart from 30 to 31 May. The Automotive conference is a focused venue with top automotive players from Germany, where software reliability issues are addressed from different perspectives. Norman’s work shows how many potential errors are left unprotected when doing software transformation at the source code level or at the level of the compiler intermediate representation. He corrects this by adding error protection in the backend phase of the compiler.


Prof. Castrillon: Selected for the ACM Future of Computing Academy


In April 21st, 2017, Prof. Castrillon was selected as a member of the ACM Future of Computing Academy (FCA), a new initiative by ACM to support and foster the next generation of computing professionals. Over 300 applicants from academia and industry applied to become members of this inaugural class of the ACM FCA. A selected group of 45 members will meet for the first time in San Francisco, USA, on June 25, after attending the ACM’s celebration of 50 years of the ACM Turing Award on June 23 - 24 at the Westin St. Francis. To the ACM FCA, Prof. Castrillon will contribute his vision of the future of computing systems with a cfaed perspective. We are very proud of being part of the ACM FCA and are looking forward to see how it develops. 


Orchestration paper accepted at VLDB


"Adaptive Work Placement for Query Processing on Heterogeneous Computing Resources"

The hardware landscape is currently changing from homogeneous multi-core systems towards heterogeneous systems with many different computing units, each with their own characteristics. This trend is a great opportunity for database systems to increase the overall performance if the heterogeneous resources can be utilized efficiently. To achieve this, the main challenge is to place the right work on the right computing unit. Current approaches tackling this placement for query processing in database systems assume that data cardinalities of intermediate results can be correctly estimated. However, this assumption does not hold for complex analytical queries. To overcome this problem, we developed an adaptive placement approach being independent of cardinality estimation of intermediate results. Our adaptive approach takes a physical query execution plan as input and divides the plan into disjoint execution islands at compile-time. The execution islands are determined in a way that the cardinalities of intermediate results within each island are known or can be precisely calculated. The placement optimization and execution is performed separately per island at query runtime. The processing of the execution islands takes place successively following data dependencies. With our novel adaptive approach, we can use heterogeneous computing resources more efficiently for query processing. The corresponding paper [1], which describes and evaluates our overall approach, has been accepted as full paper at the 43rd International Conference on Very Large Data Bases (VLDB, Generally, VLDB is the premier annual international forum for data management and database researches.

[1] Tomas Karnagel, Dirk Habich, Wolfgang Lehner: Adaptive Work Placement for Query Processing on Heterogeneous Computing Resources: Accepted at 43rd International Conference on Very Large Data Bases (VLDB, August 28 – September 1, Munich, Germany), 2017

Christian Menard receives Hermann-Willkomm prize


Christian Menard received the Hermann-Willkomm prize for his Diploma Thesis entitled “Mapping KPN-based Applications to the NoC-based Tomahawk Architecture”. The prize is awarded to the best thesis in the area of Information Systems Engineering (Informationssystemtechnik). Christian’s work was supervised by Andrés Goens, a researcher of the Chair for Compiler Construction. The techniques described in the thesis are part of a compiler developed in the context of the Orchestration Path of the Excellence Cluster cfaed. The compiler targets the Tomahawk architecture developed at the Vodafone Chair, headed by Prof. Gerhard Fettweis.

Orchestration position paper accepted at PMES


The position paper of the Orchestration path [1] has been accepted at the 1st International Workshop on Post-Moore's Era Supercomputing (PMES'16) and will be presented in Salt Lake City, USA on Monday, Nov 14.

[1] Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann, Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier,  Gerhard Fettweis, Jochen Fröhlich, Andrès Goens, Sebastian Haas, Dirk  Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang  Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker,  Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Penaloza, Michael  Raitza, Jörg Stiller, Annett Ungethüm, and Axel Voigt. The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware.
Proceedings of the 1st Workshop on Post-Moore's Era Supercomputing (PMES), 2016. Accepted for publication.

Best Demonstration Award@SIGMOD 2016


From June 26 to July 01, the annual ACM SIGMOD Conference in San Francisco, USA took place. At this international conference, we presented our demo on the topic “Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE” [1]. We are happy to announce, that this work has been awarded as Best Demonstration at SIGMOD 2016. In this demo, we have shown our novel energy-control loop, which addresses the topic of software-controlled hardware reconfigurations at runtime for data management systems running on a single heterogeneous server system. The work was developed in cooperation between the Orchestration and the HAEC path.

Abstract: Energy awareness of database systems has emerged as a critical research topic, since energy consumption is becoming a major limiter for their scalability. Recent energy-related hardware developments trend towards offering more and more configuration opportunities for the software to control its own energy consumption. Existing research so far mainly focused on leveraging this configuration spectrum to find the most energy-efficient configuration for specific operators or entire queries. In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept. Energy elasticity refers to the ability of software to behave energy- proportional and energy-efficient at the same time while maintaining a certain quality of service. Thus, our system does not draw the least energy possible but the least energy necessary to still perform reasonably. We demonstrate our overall approach using a rich interactive GUI to give attendees the opportunity to learn more about our concept.

[1] Annett Ungethüm, Thomas Kissinger, Willi-Wolfram Mentzel, Dirk Habich, Wolfgang Lehner: Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE. SIGMOD Conference 2016: 2173-2176


Limitations of Intra-Operator Parallelism using Heterogeneous Computing Resoucres


In the recent years, hardware changes shaped the database system architecture by moving from sequential execution to parallel multi-core execution and from disk-centric systems to in-memory systems. At the moment, the hardware is changing again from homogeneous CPU systems towards heterogeneous systems with many different computing units (CUs). Generally, heterogeneous systems combine different CUs, like CPUs, GPUs or Xeon Phis, with different architectures, memory hierarchies, and interconnects, leading to different execution behaviors. That means, the current challenge for the database community is to find ways to utilize these systems efficiently. One opportunity is to execute a single query operator on all available CUs, this is usually called intra-operator parallelism. In homogeneous multi-core system, that can be easily achieved by uniformly partitioning of data to all cores and there, such intra-operator parallelism is beneficial. In our current research work, we have investigated the same approach to heterogeneous systems. The corresponding paper [1] has been accepted for oral presentation and full publication at the 20th East-European Conference on Advances in Databases and Information Systems (

Figure 1: Operator Execution on Two Computing Units.


Figure 1 shows our approach using two different computing units. For heterogeneous systems, we have to define a way to find the ideal data partitioning according to the different execution performances of the given CUs. Afterwards, the partitioned data is used to execute an operator, which computes a partial result. Finally, the partial results of all CUs have to be merged. In our paper [1], we analyze this approach for two operators, selection and sorting, on two different heterogeneous systems. We present performance insides as well as occurring limitations to intra-operator parallelism in heterogeneous environments. As a result, we show that the actual potential of improvements is small, while the limitations and overheads can be significant, sometimes leading to an even worse performance than single-CU execution. Therefore, our findings can help system developers to decide if intra-operator parallelism can be applied effectively or if it should be avoided.


[1] Tomas Karnagel, Dirk Habich, Wolfgang Lehner: Limitations of Intra-Operator Parallelism using Heterogeneous Computing Resources: Accepted at 20th East-European Conference on Advances in Databases and Information Systems (ADBIS, August 28-31, Prague, Czech Republic), 2016