Parallel programming of heterogeneous platforms

Dataflow

Current hardware architectures are composed of many different cores. In the future, the amount of cores and their heterogeneity is predicted to increase. To benefit from these architectures, programs need to execute in parallel. Parallel programming with primitives such as threads is known to be difficult: Races on shared data can be prevented with locks at the risk of introducing deadlocks.

Abstractions are the key to make programming for these architectures easy and scalable for the future more complicated hardware designs. Abstracting away parallel constructs makes algorithms concise and enables compilers to handle parallel aspects for different architectures.

At the chair for compiler construction we investigate dataflow as a promising abstraction to tame parallel architectures. A dataflow application is modeled as a graph where nodes represent computation and edges represent communication. This explicit parallel representation allows the compiler to reason about execution order and exploit various types of parallelism.

Example dataflow graph.

Projects

Ohua - Implicit Parallel Programming for Systems

Ohua is a domain specific language that allows to program systems that implicitly execute concurrently. It is embedded into Clojure, a Lisp dialect on the JVM, and compiles algorithms down to dataflow graphs. Nodes of the graphs are implemented as normal functions in Java or Clojure that may have access to their own state. In the process of mapping these graphs to multi-core architectures, we investigate different runtime optimizations to efficiently execute programs.

Example Ohua dataflow graph for a key-value store application.

Meanwhile we rewrote the Ohua compiler in Haskell/Eta to bring our concepts to other languages. To follow the current development please visit us on github: https://github.com/ohua-dev

 

Ÿauhau

While implicit parallel programming remains the main goal of Ohua, we also explored its use in the construction of I/O-efficient micro-services:

Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services.
While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its latency. In this context, writing clear and concise programs which execute I/O efficiently is especially challenging.

In Ÿauhau, we introduce a novel solution based on dataflow graph rewrites. It incorporates batching and caching, while concurrency comes for free from the dataflow execution model. Our implementation, builds on top of Ohua, an embedded domain-specific language for implicit parallel programming in Clojure that translates programs into dataflow graphs. We describe a tool for generating complex synthetic micro-services to compare to existing approaches. Experiments show that Ÿauhau's batching is more efficient than state-of-the-art approaches. Additionally, we show that this efficiency is independent of the code style and fully unblocks computation from I/O while enabling concise, modular designs.

You can find more information on the project's website: Ÿauhau

 

KPN

Kahn Process Networks (KPN) are a particular model of dataflow which is characterized by its flexibility. Research has used it extensively in the field of embedded systems. At the Chair for Compiler Construction, we work on methods for efficiently and automatically mapping KPN applications to heterogeneous hardware.

Research directions include strategies for understanding and handling dynamic behavior in applications, as well as multiple-application scenarios with resource constraints. We work in cooperation with Silexica, a company that commercializes, among others, KPN-based programming tools.

An example of a KPN in a Multi-Processor System-on-Chip

Design Space Exploration and Mapping

Embedded systems are often designed as complex architectures with numerous processing elements.
Effectively programming these systems requires parallel programming models such as task-based or dataflow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. To find such a mapping, it is usually required to predict and evaluate several mapping configurations, in what is called a design space exploration. To efficiently explore the design space, we leverage symmetries [1] found in mappings due to the application or the target architecture. To do this, we exploit algorithmic aspects of the mathematical theory of groups and inverse semigroups.

mapping symmetries

In the figure, the first two (leftmost) mappings are equivalent due to symmetries of the architecture, the third (rightmost) one is not.


When mapping several independent programs to the same platform, resource conflicts can arise.
This can be circumvented by remapping some of the tasks of an application, which in turn affect its timing behavior, possibly leading to constraint violations.

In this context we evaluated different approaches that allow for remapping of applications without violating existing constraints:

  • Transitive Efficient Template Run-time System (TETRiS) [2] leverages compile-time information to map and migrate tasks in a fashion that preserves the performance of the static mapping. The approach identifies symmetries of mappings by using a mathematical description derived form group theory.

  • Design Centering [3] is used to identify robust mappings rather than optimal mappings. Such mappings are robust against small perturbations of the processing elements used. The underlying method is based on a bio-inspired design centering algorithm.

 

References

 

Sebastian Ertel, Christof Fetzer, Pascal Felber, "Ohua: Implicit Dataflow Programming for Concurrent Systems", Proceedings of the Principles and Practices of Programming on The Java Platform, ACM, pp. 51–64, New York, NY, USA, 2015.

Jeronimo Castrillon, Rainer Leupers, "Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap", Springer, pp. 258, 2014

[1] Goens, Andrés et al., “Symmetry in software synthesis,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 14, no. 2, pp. 20:1–20:26, 2017.

[2] Goens, Andrés, et al. "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings." Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems. ACM, 2017.

[3] Hempel, Gerald, et al. "Robust mapping of process networks to many-core systems using bio-inspired design centering." Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES’17). 2017.

 

 

Go back