Dr.-Ing. Gerald Hempel |
||
|
2023
- Stephanie Soldavini, Karl F. A. Friebel, Mattia Tibaldi, Gerald Hempel, Jeronimo Castrillon, Christian Pilato, "Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics", In ACM Transactions on Reconfigurable Technology and Systems (TRETS), Association for Computing Machinery, vol. 16, no. 2, New York, NY, USA, Mar 2023. [doi] [Bibtex & Downloads]
Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics
Reference
Stephanie Soldavini, Karl F. A. Friebel, Mattia Tibaldi, Gerald Hempel, Jeronimo Castrillon, Christian Pilato, "Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics", In ACM Transactions on Reconfigurable Technology and Systems (TRETS), Association for Computing Machinery, vol. 16, no. 2, New York, NY, USA, Mar 2023. [doi]
Abstract
Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25 \texttimes more energy efficient than expert-crafted Intel CPU implementations.
Bibtex
@Article{friebel_trets23,
author = {Stephanie Soldavini and Karl F. A. Friebel and Mattia Tibaldi and Gerald Hempel and Jeronimo Castrillon and Christian Pilato},
title = {Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics},
doi = {10.1145/3563553},
issn = {1936-7406},
number = {2},
url = {https://doi.org/10.1145/3563553},
volume = {16},
abstract = {Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25 \texttimes{} more energy efficient than expert-crafted Intel CPU implementations.},
address = {New York, NY, USA},
articleno = {21},
journal = {ACM Transactions on Reconfigurable Technology and Systems (TRETS)},
month = mar,
numpages = {34},
publisher = {Association for Computing Machinery},
year = {2023},
}Downloads
No Downloads available for this publication
Permalink
2022
- Asif Ali Khan, Sebastien Ollivier, Stephen Longofono, Gerald Hempel, Jeronimo Castrillon, Alex K. Jones, "Brain-inspired Cognition in Next Generation Racetrack Memories", In ACM Transactions on Embedded Computing Systems (TECS), Association for Computing Machinery, vol. 21, no. 6, pp. 79:1–79:28, New York, NY, USA, Mar 2022. [doi] [Bibtex & Downloads]
Brain-inspired Cognition in Next Generation Racetrack Memories
Reference
Asif Ali Khan, Sebastien Ollivier, Stephen Longofono, Gerald Hempel, Jeronimo Castrillon, Alex K. Jones, "Brain-inspired Cognition in Next Generation Racetrack Memories", In ACM Transactions on Embedded Computing Systems (TECS), Association for Computing Machinery, vol. 21, no. 6, pp. 79:1–79:28, New York, NY, USA, Mar 2022. [doi]
Abstract
Hyperdimensional computing (HDC) is an emerging computational framework inspired by the brain that operates on vectors with thousands of dimensions to emulate cognition. Unlike conventional computational frameworks that operate on numbers, HDC, like the brain, uses high dimensional random vectors and is capable of one-shot learning. HDC is based on a well-defined set of arithmetic operations and is highly error-resilient. The core operations of HDC manipulate HD vectors in bulk bit-wise fashion, offering many opportunities to leverage parallelism. Unfortunately, on conventional von Neumann architectures, the continuous movement of HD vectors among the processor and the memory can make the cognition task prohibitively slow and energy-intensive. Hardware accelerators only marginally improve related metrics. In contrast, even partial implementations of an HDC framework inside memory can provide considerable performance/energy gains as demonstrated in prior work using memristors. This paper presents an architecture based on racetrack memory (RTM) to conduct and accelerate the entire HDC framework within memory. The proposed solution requires minimal additional CMOS circuitry by leveraging a read operation across multiple domains in RTMs called transverse read (TR) to realize exclusive-or (XOR) and addition operations. To minimize the CMOS circuitry overhead, an RTM nanowire-based counting mechanism is proposed. Using language recognition as the example workload, the proposed RTM HDC system reduces the energy consumption by 8.6x compared to the state-of-the-art in-memory implementation. Compared to dedicated hardware design realized with an FPGA, RTM-based HDC processing demonstrates 7.8x and 5.3x improvements in the overall runtime and energy consumption, respectively.
Bibtex
@Article{khan_tecs22,
author = {Asif Ali Khan and Sebastien Ollivier and Stephen Longofono and Gerald Hempel and Jeronimo Castrillon and Alex K. Jones},
title = {Brain-inspired Cognition in Next Generation Racetrack Memories},
abstract = {Hyperdimensional computing (HDC) is an emerging computational framework inspired by the brain that operates on vectors with thousands of dimensions to emulate cognition. Unlike conventional computational frameworks that operate on numbers, HDC, like the brain, uses high dimensional random vectors and is capable of one-shot learning. HDC is based on a well-defined set of arithmetic operations and is highly error-resilient. The core operations of HDC manipulate HD vectors in bulk bit-wise fashion, offering many opportunities to leverage parallelism. Unfortunately, on conventional von Neumann architectures, the continuous movement of HD vectors among the processor and the memory can make the cognition task prohibitively slow and energy-intensive. Hardware accelerators only marginally improve related metrics. In contrast, even partial implementations of an HDC framework inside memory can provide considerable performance/energy gains as demonstrated in prior work using memristors. This paper presents an architecture based on racetrack memory (RTM) to conduct and accelerate the entire HDC framework within memory. The proposed solution requires minimal additional CMOS circuitry by leveraging a read operation across multiple domains in RTMs called transverse read (TR) to realize exclusive-or (XOR) and addition operations. To minimize the CMOS circuitry overhead, an RTM nanowire-based counting mechanism is proposed. Using language recognition as the example workload, the proposed RTM HDC system reduces the energy consumption by 8.6x compared to the state-of-the-art in-memory implementation. Compared to dedicated hardware design realized with an FPGA, RTM-based HDC processing demonstrates 7.8x and 5.3x improvements in the overall runtime and energy consumption, respectively.},
address = {New York, NY, USA},
journal = {ACM Transactions on Embedded Computing Systems (TECS)},
month = mar,
numpages = {28},
publisher = {Association for Computing Machinery},
year = {2022},
doi = {10.1145/3524071},
issn = {1539-9087},
url = {https://doi.org/10.1145/3524071},
volume = {21},
number = {6},
articleno = {79},
pages = {79:1--79:28},
}Downloads
2203_Khan_TECS [PDF]
Permalink
2021
- Karl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jeronimo Castrillon, "From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics", Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER) — FPGA for HPC Workshop, pp. 759–766, Sep 2021. [doi] [Bibtex & Downloads]
From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics
Reference
Karl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jeronimo Castrillon, "From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics", Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER) — FPGA for HPC Workshop, pp. 759–766, Sep 2021. [doi]
Bibtex
@InProceedings{friebel_fpga4hpc21,
author = {Karl F. A. Friebel and Stephanie Soldavini and Gerald Hempel and Christian Pilato and Jeronimo Castrillon},
booktitle = {Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER) --- FPGA for HPC Workshop},
title = {From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics},
doi = {10.1109/Cluster48925.2021.00112},
location = {Portland (virtual), OR, USA},
pages = {759--766},
url = {https://ieeexplore.ieee.org/document/9556064},
month = sep,
numpages = {8},
year = {2021},
}Downloads
2109_Friebel_fpga4hpc [PDF]
Permalink
- Jeronimo Castrillon, Felix Wittwer, Karl Friebel, Gerald Hempel, Burkhard Ringlein, Stephanie Soldavini, Christian Pilato, Mattia Tibaldi, Fabrizio Ferrandi, Stanislav Böhm, Francesco Regazzoni, Kartik Nayak, "EVEREST: Definition of the compilation framework", Technical report, EVEREST consortium, Jul 2021. [Bibtex & Downloads]
EVEREST: Definition of the compilation framework
Reference
Jeronimo Castrillon, Felix Wittwer, Karl Friebel, Gerald Hempel, Burkhard Ringlein, Stephanie Soldavini, Christian Pilato, Mattia Tibaldi, Fabrizio Ferrandi, Stanislav Böhm, Francesco Regazzoni, Kartik Nayak, "EVEREST: Definition of the compilation framework", Technical report, EVEREST consortium, Jul 2021.
Bibtex
@Report{castrillon_everestD4.1_2021,
author = {Jeronimo Castrillon and Felix Wittwer and Karl Friebel and Gerald Hempel and Burkhard Ringlein and Stephanie Soldavini and Christian Pilato and Mattia Tibaldi and Fabrizio Ferrandi and Stanislav B{\"o}hm and Francesco Regazzoni and Kartik Nayak},
institution = {EVEREST consortium},
title = {{EVEREST}: Definition of the compilation framework},
type = {techreport},
url = {https://drive.switch.ch/index.php/s/3lloP4p1ukGUdJx},
month = jul,
year = {2021},
}Downloads
2107_Castrillon-Everest-D4 [1]
Permalink
- Jeronimo Castrillon, Felix Wittwer, Karl Friebel, Gerald Hempel, Jan Martinovic, Stanislav Böhm, Martin Surkovsky, Michele Paolino, Fabrizio Ferrandi, Serena Curzel, Michele Fiorito, Christian Pilato, Stephanie Soldavini, Gianluca Palermo, Dionysios Diamantopoulos, "EVEREST: Definition of Language Requirements", Technical report, EVEREST consortium, Apr 2021. [Bibtex & Downloads]
EVEREST: Definition of Language Requirements
Reference
Jeronimo Castrillon, Felix Wittwer, Karl Friebel, Gerald Hempel, Jan Martinovic, Stanislav Böhm, Martin Surkovsky, Michele Paolino, Fabrizio Ferrandi, Serena Curzel, Michele Fiorito, Christian Pilato, Stephanie Soldavini, Gianluca Palermo, Dionysios Diamantopoulos, "EVEREST: Definition of Language Requirements", Technical report, EVEREST consortium, Apr 2021.
Bibtex
@Report{castrillon_everestD2.2_2021,
author = {Jeronimo Castrillon and Felix Wittwer and Karl Friebel and Gerald Hempel and Jan Martinovic and Stanislav B{\"o}hm and Martin Surkovsky and Michele Paolino and Fabrizio Ferrandi and Serena Curzel and Michele Fiorito and Christian Pilato and Stephanie Soldavini and Gianluca Palermo and Dionysios Diamantopoulos},
institution = {EVEREST consortium},
title = {{EVEREST}: Definition of Language Requirements},
type = {techreport},
url = {https://drive.switch.ch/index.php/s/ddn1yGnHavgzXpB},
month = apr,
year = {2021},
}Downloads
2104_Castrillon-Everest-D2 [2]
Permalink
- Christian Menard, Andr'es Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores", Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Association for Computing Machinery, pp. 66–73, New York, NY, USA, Jan 2021. (Video Presentation) [doi] [Bibtex & Downloads]
Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores
Reference
Christian Menard, Andr'es Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin—Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores", Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Association for Computing Machinery, pp. 66–73, New York, NY, USA, Jan 2021. (Video Presentation) [doi]
Abstract
We present Mocasin, an open-source rapid prototyping framework for researching, implementing and validating new algorithms and solutions in the field of mapping software to heterogeneous multi-cores. In contrast to the many existing tools that often specialize for a particular use-case, Mocasin is an open, flexible and generic research environment that abstracts over the approaches taken by other tools. Mocasin is designed to support a wide range of models of computation and input formats, implements manifold mapping strategies and provides an adjustable high-level simulator for performance estimation. This infrastructure serves as a flexible vehicle for exploring new approaches and as a blueprint for building customized tools. We highlight the key design aspects of Mocasin that enable its flexibility and illustrate its capabilities in a case-study showing how Mocasin can be used for building a customized tool for researching runtime mapping strategies in an LTE uplink receiver.
Bibtex
@InProceedings{menard_rapido21,
author = {Christian Menard and Andrés Goens and Gerald Hempel and Robert Khasanov and Julian Robledo and Felix Teweleitt and Jeronimo Castrillon},
title = {Mocasin---Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores},
booktitle = {Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2021},
address = {New York, NY, USA},
month = jan,
publisher = {ACM},
doi = {10.1145/3444950.3447285},
isbn = {9781450389525},
location = {Budapest, Hungary},
pages = {66–73},
publisher = {Association for Computing Machinery},
series = {DroneSE and RAPIDO '21},
url = {https://doi.org/10.1145/3444950.3447285},
abstract = {We present Mocasin, an open-source rapid prototyping framework for researching, implementing and validating new algorithms and solutions in the field of mapping software to heterogeneous multi-cores. In contrast to the many existing tools that often specialize for a particular use-case, Mocasin is an open, flexible and generic research environment that abstracts over the approaches taken by other tools. Mocasin is designed to support a wide range of models of computation and input formats, implements manifold mapping strategies and provides an adjustable high-level simulator for performance estimation. This infrastructure serves as a flexible vehicle for exploring new approaches and as a blueprint for building customized tools. We highlight the key design aspects of Mocasin that enable its flexibility and illustrate its capabilities in a case-study showing how Mocasin can be used for building a customized tool for researching runtime mapping strategies in an LTE uplink receiver.},
numpages = {8},
}Downloads
2101_Menard_RAPIDO [PDF]
Permalink
2017
- Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]
Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering
Reference
Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi]
Bibtex
@InProceedings{hempel_scopes17,
author = {Gerald Hempel and Andr\'{e}s Goens and Josefine Asmus and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17)},
year = {2017},
series = {SCOPES '17},
pages = {21--30},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
acmid = {3078667},
doi = {10.1145/3078659.3078667},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078667}
}Downloads
1706_Hempel_SCOPES [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
2015
- Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs", Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link]) [Bibtex & Downloads]
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
Reference
Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs", Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link])
Abstract
In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.
Bibtex
@InProceedings{vogt15,
Title={GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs},
Author={Vogt , Markus and Hempel, Gerald and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP)},
Year={2015},
Month=sep,
Series={FSP 2015},
archivePrefix={arXiv},
arxivId={1509.00025},
eprint={1509.00025},
abstract={In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.},
}Downloads
1509_Vogt_FSP [PDF]
Related Paths
Permalink
- Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs", Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad), SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015. [Bibtex & Downloads]
Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs
Reference
Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs", Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad), SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015.
Bibtex
@InProceedings{hempeldsd15,
Title={Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs},
Author={Hempel, Gerald and Vogt, Markus and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session},
Year={2015},
Address={Funchal, Madeira (Portugal)},
Editor={Grosspietsch, Erwin and Kl{\"o}ckner, Konrad},
Month={August},
Publisher={SEA-Publications: SEA-SR-44},
Series={DSD/SEAA 2015},
ISBN={978-3-902457-44-8}
}Downloads
1508_Hempel_DSD [PDF]
Related Paths
Permalink