Paper on Load Balancing accepted at Euro-Par


With more and more parallel architectures being used, the problem to optimally distributing work and thereby balancing load gains importance. To tackle this challenge, graph partitioning algorithms have been successfully applied in various application domains. However, there is a mismatch between solutions found by classical graph partitioning and the behavior of many real hardware systems. Classical graph partitioning assumes that individual vertex weights add up to partition weights (by us referred to as linear graph partitioning). In the context of parallel or heterogeneous systems, the assumption implies that performance scales linearly with the number of tasks. In reality however, performance does usually not scale linearly with the amount of work due to contention on hardware, operating system, or application resources. We address this mismatch with penalized graph partitioning, a special case of non-linear graph-partitioning, in this paper [1]. The result is a novel load balancing algorithm that shares the advantages of classical graph partitioning while at the same time considering the non-linear performance of real-systems.


To demonstrate the potential of our penalized graph partitioning in presence of non-linear resources, we perform a synthetic partitioning experiment. To run the experiment, we generate a workload graph that contains 1,000 heterogeneous tasks with weights following a Zipf distribution. Each task in the workload graph is communicating with 0 to 10 other tasks (again Zipf distributed). To model a system, we use an exponential penalty function and assume that the underlying resources can execute 16 parallel tasks before the penalty grows with the square of the cardinality due to contention (Figure 1a). The workload in this experiment is partitioned into 32 balanced partitions using a standard graph partitioning library. Afterward, to estimate the actual load for each node, the penalty function is applied to each partition based on the partition cardinality (Figure 1b). The resulting partition weights are compared to a second partitioning of the graph that was generated by our novel penalized graph partitioning algorithm (Figure 1c). The unmodified partitioning algorithm (Figure b), which is unaware of the contention, tries to balance the load. The resulting relative weights show that the node with the highest partition weight receives 3.1 times the load of the node with the lowest partition weight. In contrast, our penalized partitioning algorithm leads to partition weights, and hence node utilizations, that are balanced within a tolerance of 3%.

[1] Tim Kiefer, Dirk Habich, Wolfgang Lehner: Penalized Graph Partitioning for Static and Dynamic Load Balancing. To appears in the proceedings of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par) in Grenoble, France, August 22-26.


Go back