Invited Talk of Tal Ben-Nun in cfaed Seminar Series
Memory Access Patterns: The Missing Piece of the GPU Programming Puzzle
Published on in ORCHESTRATION (PATH ACTIVITIES)
In April, 2016 the Orchestration Path invited Tal Ben-Nun from Hebrew University of Jerusalem as a visior. In his talk he reported on his recent and ongoing work on memory access patterns .
Abstract: GPUs play an increasingly important role in high-performance computing, accelerating applications ranging from machine learning and computer vision to quantitative finance. In spite of their popularity, GPU programming remains a challenging task. Manual management of the multi-level memory hierarchy, inter-device synchronization, and various development constraints often generate lengthy, error-prone, and architecture-specific code. Therefore, it is imperative to develop new programming paradigms for efficient utilization of GPU and multi- GPU systems, without sacrificing code simplicity and portability. The talk presents Memory-Oriented Programming (MOP) - a data-centric programming model organized around memory access patterns. MOP categorizes parallel algorithms by their input and output parameters, where each parameter is characterized by a single pattern. Using these patterns, which cover most existing GPU applications, we show that it is possible to write short and intelligible code that attains high performance on a variety of GPU architectures and multi-GPU nodes. Furthermore, we demonstrate that the model can be used to increase the resilience of multi- GPU nodes by providing transparent failure protection mechanisms. We show that the resulting code is on par, and in some cases outperforms, existing manually optimized production-grade libraries, exhibiting near-linear speedup on a single node with multiple GPUs. Performance is measured on fundamental computational operations, as well as real-world applications in deep learning and non-negative matrix factorization.
 Tal Ben-Nun, Ely Levy, Amnon Barak, and Eri Rubin. 2015. Memory access patterns: the missing piece of the multi-GPU puzzle. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, , Article 19 , 12 pages. DOI=http://dx.doi.org/10.1145/2807591.2807611