Prof. Dr.-Ing. Jeronimo Castrillon |
||
![]() |
Phone Fax Visitor's Address |
jeronimo.castrillon@tu-dresden.de +49 (0)351 463 42716 +49 (0)351 463 39995 Chair for Compiler Construction |
Jerónimo Castrillón received the Electronics Engineering degree with honors from the Pontificia Bolivariana University in Colombia in 2004, the master degree from the ALaRI Institute in Switzerland in 2006 and the Ph.D. degree (Dr.-Ing.) on Electric Engineering and Information Technology with honors from the RWTH Aachen University in Germany in 2013. From early 2009 to April 2013 Dr. Castrillón was the chief engineer of the chair for Software for Systems on Silicon at the RWTH Aachen University, where he was enrolled as research staff since late 2006. From April 2013 to April 2014 Dr. Castrillón was senior scientific staff in the same institution.
In June 2014, Dr. Castrillón joined the department of computer science of the TU Dresden as professor for compiler construction in the context of the German excellence cluster “Center for Advancing Electronics Dresden” (cfaed). His research interests lie on methodologies, languages, tools and algorithms for programming complex computing systems.
Prof. Castrillón has several international publications and has served as program chair and technical program committee in international conferences and workshops (e.g., LCTES, CASES, DAC, DATE, CODES-ISSS, CASES, CGO, Computing Frontiers, FPL, ICCS and MCSoC) as well as a reviewer for ACM and IEEE journals among others. Prof. Castrillón is the recipient of numerous awards, including the Swiss Excellence Government Scholarship in 2005 and the Intel Doctoral Award in 2012. In 2014 he co-founded Silexica GmbH, a company that provides programming tools for embedded multicore architectures. From 2017 to 2019, Prof. Castrillon was a founding member of the executive committee of the ACM “Future of Computing Academy”.
2021
- Christian Pilato, Stanislav Bohm, Fabien Brocheton, Jeronimo Castrillon, Riccardo Cevasco, Vojtech Cima, Radim Cmar, Dionysios Diamantopoulos, Fabrizio Ferrandi, Jan Martinovic, Gianluca Palermo, Michele Paolino, Antonio Parodi, Lorenzo Pittaluga, Daniel Raho, Francesco Regazzoni, Katerina Slaninova, Christoph Hagleitner, "EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms" (to appear), Proceedings of the 2021 Design, Automation and Test in Europe Conference (DATE), Feb 2021. [Bibtex & Downloads]
EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms
Reference
Christian Pilato, Stanislav Bohm, Fabien Brocheton, Jeronimo Castrillon, Riccardo Cevasco, Vojtech Cima, Radim Cmar, Dionysios Diamantopoulos, Fabrizio Ferrandi, Jan Martinovic, Gianluca Palermo, Michele Paolino, Antonio Parodi, Lorenzo Pittaluga, Daniel Raho, Francesco Regazzoni, Katerina Slaninova, Christoph Hagleitner, "EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms" (to appear), Proceedings of the 2021 Design, Automation and Test in Europe Conference (DATE), Feb 2021.
Bibtex
@InProceedings{pilato_date21,
author = {Christian Pilato and Stanislav Bohm and Fabien Brocheton and Jeronimo Castrillon and Riccardo Cevasco and Vojtech Cima and Radim Cmar and Dionysios Diamantopoulos and Fabrizio Ferrandi and Jan Martinovic and Gianluca Palermo and Michele Paolino and Antonio Parodi and Lorenzo Pittaluga and Daniel Raho and Francesco Regazzoni and Katerina Slaninova and Christoph Hagleitner},
booktitle = {Proceedings of the 2021 Design, Automation and Test in Europe Conference (DATE)},
title = {{EVEREST}: A design environment for extreme-scale big data analytics on heterogeneous platforms},
location = {Virtual Conference},
series = {DATE'21},
month = feb,
year = {2021},
}Downloads
No Downloads available for this publication
Permalink
- Christian Menard, Andrés Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin – Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores" (to appear), Proceedings of the 13th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, New York, NY, USA, Jan 2021. [Bibtex & Downloads]
Mocasin – Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores
Reference
Christian Menard, Andrés Goens, Gerald Hempel, Robert Khasanov, Julian Robledo, Felix Teweleitt, Jeronimo Castrillon, "Mocasin – Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores" (to appear), Proceedings of the 13th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, New York, NY, USA, Jan 2021.
Bibtex
@InProceedings{menard_rapido21,
author = {Christian Menard and Andr\'{e}s Goens and Gerald Hempel and Robert Khasanov and Julian Robledo and Felix Teweleitt and Jeronimo Castrillon},
title = {Mocasin -- Rapid Prototyping of Rapid Prototyping Tools: A Framework for Exploring New Approaches in Mapping Software to Heterogeneous Multi-cores},
booktitle = {Proceedings of the 13th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2021},
series = {RAPIDO '21},
address = {New York, NY, USA},
month = jan,
publisher = {ACM},
location = {Budapest, Hungary},
numpages = {8},
}Downloads
No Downloads available for this publication
Permalink
- Hasna Bouraoui, Chadlia Jerad, Jeronimo Castrillon, "Towards Adaptive multi-Alternative Process Network" , Proceedings of the 12th Workshop and 10th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'21), co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany, Jan 2021. [Bibtex & Downloads]
Towards Adaptive multi-Alternative Process Network
Reference
Hasna Bouraoui, Chadlia Jerad, Jeronimo Castrillon, "Towards Adaptive multi-Alternative Process Network" , Proceedings of the 12th Workshop and 10th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'21), co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany, Jan 2021.
Bibtex
@InProceedings{bouraoui_parma21,
author = {Hasna Bouraoui and Chadlia Jerad and Jeronimo Castrillon},
title = {Towards Adaptive multi-Alternative Process Network},
booktitle = {Proceedings of the 12th Workshop and 10th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'21), co-located with 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2021},
series = {PARMA-DITAM 2021},
address = {Germany},
month = jan,
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum für Informatik, Dagstuhl Publishing},
location = {Budapest, Hungary},
numpages = {10},
}Downloads
2101_Bouraoui_PARMA [PDF]
Permalink
2020
- Lars Schütze, Jeronimo Castrillon, "Efficient Dispatch of Multi-Object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages" , Proceedings of the17th International Conference on Managed Programming Languages & Runtimes (MPLR'20), Association for Computing Machinery, pp. 52–62, New York, NY, USA, Nov 2020. [doi] [Bibtex & Downloads]
Efficient Dispatch of Multi-Object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages
Reference
Lars Schütze, Jeronimo Castrillon, "Efficient Dispatch of Multi-Object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages" , Proceedings of the17th International Conference on Managed Programming Languages & Runtimes (MPLR'20), Association for Computing Machinery, pp. 52–62, New York, NY, USA, Nov 2020. [doi]
Abstract
Adaptive software becomes more and more important as computing is increasingly context-dependent. Runtime adaptability can be achieved by dynamically selecting and applying context-specific code. Role-oriented programming has been proposed as a paradigm to enable runtime adaptive software by design. Roles change the objects’ behavior at runtime, thus adapting the software to a given context. The cost of adaptivity is however a high runtime overhead stemming from executing compositions of behavior-modifying code. It has been shown that the overhead can be reduced by optimizing dispatch plans at runtime for static cases, but no method exists to reduce the overhead in cases with high variability. This paper presents a novel approach to implement polymorphic role dispatch, taking advantage of dependent types and using run-time information to effectively guard abstractions and enable reuse. The concept of polymorphic inline caches is extended to role invocations. We evaluate the implementation with a benchmark for role-oriented programming languages achieving a geometric mean speedup of 4.0$\times$ (3.8$\times$ up to 4.5$\times$) in the static case, and close to no overhead in the dynamic case over the current implementation of contextual roles in Object Teams.
Bibtex
@InProceedings{schuetze_mplr20,
author = {Lars Sch{\"u}tze and Jeronimo Castrillon},
booktitle = {Proceedings of the17th International Conference on Managed Programming Languages \& Runtimes (MPLR'20)},
title = {Efficient Dispatch of Multi-Object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages},
location = {Virtual, UK},
pages = {52--62},
numpages = {11},
series = {MPLR'20},
abstract = {Adaptive software becomes more and more important as computing is increasingly context-dependent. Runtime adaptability can be achieved by dynamically selecting and applying context-specific code. Role-oriented programming has been proposed as a paradigm to enable runtime adaptive software by design. Roles change the objects’ behavior at runtime, thus adapting the software to a given context. The cost of adaptivity is however a high runtime overhead stemming from executing compositions of behavior-modifying code. It has been shown that the overhead can be reduced by optimizing dispatch plans at runtime for static cases, but no method exists to reduce the overhead in cases with high variability. This paper presents a novel approach to implement polymorphic role dispatch, taking advantage of dependent types and using run-time information to effectively guard abstractions and enable reuse. The concept of polymorphic inline caches is extended to role invocations. We evaluate the implementation with a benchmark for role-oriented programming languages achieving a geometric mean speedup of 4.0$\times$ (3.8$\times$ up to 4.5$\times$) in the static case, and close to no overhead in the dynamic case over the current implementation of contextual roles in Object Teams.},
year = {2020},
month = nov,
numpages = {9},
isbn = {9781450388535},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3426182.3426186},
doi = {10.1145/3426182.3426186},
}Downloads
2010_Schuetze_MPLR [PDF]
Permalink
- Robert Wittig, Andrés Goens, Christian Menard, Emil Matus, Gerhard P. Fettweis, Jeronimo Castrillon, "Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach" , Proceedings of the 27th International Conference on Telecommunications (ICT), pp. 1-5, Oct 2020. [doi] [Bibtex & Downloads]
Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach
Reference
Robert Wittig, Andrés Goens, Christian Menard, Emil Matus, Gerhard P. Fettweis, Jeronimo Castrillon, "Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach" , Proceedings of the 27th International Conference on Telecommunications (ICT), pp. 1-5, Oct 2020. [doi]
Abstract
In the era of 5G and beyond, adaptive workloads and the need for energy efficiency drive are becoming increasingly vital. Changes in parameters of the physical layer algorithm can cascade throughout the algorithm, requiring additional changes to keep a correct functionality within the timing bounds. These factors drive the process of designing systems for mobile communication towards reconfigurability. In this paper we analyze the trade-offs involved in changing algorithmic parameters and show how reconfigurable systems can be used to produce energy-efficient systems. We argue that we ought to resort to formal models to tame this reconfigurability and examine where existing formal models fall short.
Bibtex
@InProceedings{goens_ict20,
author = {Robert Wittig and Andr{\'e}s Goens and Christian Menard and Emil Matus and Gerhard P. Fettweis and Jeronimo Castrillon},
booktitle = {Proceedings of the 27th International Conference on Telecommunications (ICT)},
title = {Modem Design in the Era of 5G and Beyond: The Need for a Formal Approach},
location = {Virtual. Bali, Indonesia},
month = oct,
abstract = {In the era of 5G and beyond, adaptive workloads and the need for energy efficiency drive are becoming increasingly vital. Changes in parameters of the physical layer algorithm can cascade throughout the algorithm, requiring additional changes to keep a correct functionality within the timing bounds. These factors drive the process of designing systems for mobile communication towards reconfigurability. In this paper we analyze the trade-offs involved in changing algorithmic parameters and show how reconfigurable systems can be used to produce energy-efficient systems. We argue that we ought to resort to formal models to tame this reconfigurability and examine where existing formal models fall short.},
year = {2020},
pages={1-5},
doi={10.1109/ICT49546.2020.9239539},
url = {https://ieeexplore.ieee.org/document/9239539},
}Downloads
2010_Wittig_ICT [PDF]
Permalink
- Asif Ali Khan, Hauke Mewes, Tobias Grosser, Torsten Hoefler, Jeronimo Castrillon, "Polyhedral Compilation for Racetrack Memories" , In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD). Special issue on Compilers, Architecture, and Synthesis of Embedded Systems (CASES'20), IEEE Press, vol. 39, no. 11, pp. 3968-3980, Oct 2020. [doi] [Bibtex & Downloads]
Polyhedral Compilation for Racetrack Memories
Reference
Asif Ali Khan, Hauke Mewes, Tobias Grosser, Torsten Hoefler, Jeronimo Castrillon, "Polyhedral Compilation for Racetrack Memories" , In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD). Special issue on Compilers, Architecture, and Synthesis of Embedded Systems (CASES'20), IEEE Press, vol. 39, no. 11, pp. 3968-3980, Oct 2020. [doi]
Abstract
Traditional memory hierarchy designs, primarily based on SRAM and DRAM, become increasingly unsuitable to meet the performance, energy, bandwidth and area requirements of modern embedded and high-performance computer systems. Racetrack Memory (RTM), an emerging non-volatile memory technology, promises to meet these conflicting demands by offering simultaneously high speed, higher density, and non-volatility. RTM provides these efficiency gains by not providing immediate access to all storage locations, but by instead storing data sequentially in the equivalent to nanoscale tapes called tracks. Before any data can be accessed, explicit shift operations must be issued that cost energy and increase access latency. The result is a fundamental change in memory performance behavior: the address distance between subsequent memory accesses now has a linear effect on memory performance. While there are first techniques to optimize programs for linear-latency memories such as RTM, existing automatic solutions treat only scalar memory accesses. This work presents the first automatic compilation framework that optimizes static loop programs over arrays for linear-latency memories. We extend the polyhedral compilation framework Polly to generate code that maximizes accesses to the same or consecutive locations, thereby minimizing the number of shifts. Our experimental results show that the optimized code incurs up to 85% fewer shifts (average 41%), improving both performance and energy consumption by an average of 17.9% and 39.8%, respectively. Our results show that automatic techniques make it possible to effectively program linear-latency memory architectures such as RTM.
Bibtex
@Article{khan_cases20,
author = {Asif Ali Khan and Hauke Mewes and Tobias Grosser and Torsten Hoefler and Jeronimo Castrillon},
title = {Polyhedral Compilation for Racetrack Memories},
journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD). Special issue on Compilers, Architecture, and Synthesis of Embedded Systems (CASES'20)},
year = {2020},
series = {CASES ’20},
month = oct,
doi = {10.1109/TCAD.2020.3012266},
url = {https://ieeexplore.ieee.org/document/9216560},
volume={39},
number={11},
pages={3968-3980},
issn = {1937-4151},
issn = {1937-4151},
abstract = {Traditional memory hierarchy designs, primarily based on SRAM and DRAM, become increasingly unsuitable to meet the performance, energy, bandwidth and area requirements of modern embedded and high-performance computer systems. Racetrack Memory (RTM), an emerging non-volatile memory technology, promises to meet these conflicting demands by offering simultaneously high speed, higher density, and non-volatility. RTM provides these efficiency gains by not providing immediate access to all storage locations, but by instead storing data sequentially in the equivalent to nanoscale tapes called tracks. Before any data can be accessed, explicit shift operations must be issued that cost energy and increase access latency. The result is a fundamental change in memory performance behavior: the address distance between subsequent memory accesses now has a linear effect on memory performance. While there are first techniques to optimize programs for linear-latency memories such as RTM, existing automatic solutions treat only scalar memory accesses. This work presents the first automatic compilation framework that optimizes static loop programs over arrays for linear-latency memories. We extend the polyhedral compilation framework Polly to generate code that maximizes accesses to the same or consecutive locations, thereby minimizing the number of shifts. Our experimental results show that the optimized code incurs up to 85\% fewer shifts (average 41\%), improving both performance and energy consumption by an average of 17.9\% and 39.8\%, respectively. Our results show that automatic techniques make it possible to effectively program linear-latency memory architectures such as RTM.},
booktitle = {Proceedings of the 2020 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES)},
location = {Virtual conference},
numpages = {12},
publisher = {IEEE Press},
}Downloads
2009_Khan_CASES [PDF]
Related Paths
Permalink
- Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Improving the Performance of Block-based DRAM Caches via Tag-Data Decoupling" , In IEEE Transactions on Computers, Oct 2020. [doi] [Bibtex & Downloads]
Improving the Performance of Block-based DRAM Caches via Tag-Data Decoupling
Reference
Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Improving the Performance of Block-based DRAM Caches via Tag-Data Decoupling" , In IEEE Transactions on Computers, Oct 2020. [doi]
Abstract
In-package DRAM-based Last-Level-Caches (LLCs) that cache data in small chunks (i.e., blocks) are promising for improving system performance due to their efficient main memory bandwidth utilization. However, in these high-capacity DRAM caches, managing metadata (i.e., tags) at low cost is challenging. Storing the tags in SRAM has the advantage of quick tag access but is impractical due to a large area overhead. Storing the tags in DRAM reduces the area overhead but incurs tag serialization latency for an associative LLC design, which is inevitable for achieving high cache hit rate. To address the area and latency overhead problem, we propose a block- based DRAM LLC design that decouples tag and data into two regions in DRAM. Our design stores the tags in a latency-optimized DRAM region as the tags are accessed more often than the data. In contrast, we optimize the data region for area efficiency and map spatially-adjacent cache blocks to the same DRAM row to exploit spatial locality. Our design mitigates the tag serialization latency of existing associative DRAM LLCs via selective in-DRAM tag comparison, which overlaps the latency of tag and data accesses. This efficiently enables LLC bypassing via a novel DRAM Absence Table (DAT) that not only provides fast LLC miss detection but also reduces in-package bandwidth requirements. Our evaluation using SPEC2006 benchmarks shows that our tag-data decoupled LLC improves system performance by 11.7% compared to a state-of-the-art direct-mapped LLC design and by 7.2% compared to an existing associative LLC design.
Bibtex
@Article{hameed_tc20,
author = {Fazal Hameed and Asif Ali Khan and Jeronimo Castrillon},
title = {Improving the Performance of Block-based DRAM Caches via Tag-Data Decoupling},
journal = {IEEE Transactions on Computers},
year = {2020},
month = oct,
abstract = {In-package DRAM-based Last-Level-Caches (LLCs) that cache data in small chunks (i.e., blocks) are promising for improving system performance due to their efficient main memory bandwidth utilization. However, in these high-capacity DRAM caches, managing metadata (i.e., tags) at low cost is challenging. Storing the tags in SRAM has the advantage of quick tag access but is impractical due to a large area overhead. Storing the tags in DRAM reduces the area overhead but incurs tag serialization latency for an associative LLC design, which is inevitable for achieving high cache hit rate. To address the area and latency overhead problem, we propose a block- based DRAM LLC design that decouples tag and data into two regions in DRAM. Our design stores the tags in a latency-optimized DRAM region as the tags are accessed more often than the data. In contrast, we optimize the data region for area efficiency and map spatially-adjacent cache blocks to the same DRAM row to exploit spatial locality. Our design mitigates the tag serialization latency of existing associative DRAM LLCs via selective in-DRAM tag comparison, which overlaps the latency of tag and data accesses. This efficiently enables LLC bypassing via a novel DRAM Absence Table (DAT) that not only provides fast LLC miss detection but also reduces in-package bandwidth requirements. Our evaluation using SPEC2006 benchmarks shows that our tag-data decoupled LLC improves system performance by 11.7\% compared to a state-of-the-art direct-mapped LLC design and by 7.2\% compared to an existing associative LLC design.},
doi = {10.1109/TC.2020.3029615},
url = {https://ieeexplore.ieee.org/document/9220805},
issn = {0018-9340},
numpages = {14},
}Downloads
2010_Hameed_TC [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "The role of domain-specific languages for cyber-physical systems" , In Seminar series: Design and Programming Cyber-Physical Systems and IoT Applications (invited talk), Oct 2020. [Bibtex & Downloads]
The role of domain-specific languages for cyber-physical systems
Reference
Jeronimo Castrillon, "The role of domain-specific languages for cyber-physical systems" , In Seminar series: Design and Programming Cyber-Physical Systems and IoT Applications (invited talk), Oct 2020.
Abstract
Embedded and cyber-physical systems (CPS) are heterogeneous interconnected computing systems with an ever increasing complexity. As CPSs become more widespread, the developer community widens, exposing the complexity to mainstream programmers. In this talk, I will talk about domain-specific languages (DSLs) as a promising avenue to handle complexity without compromising on efficiency. The talk will provide background on programming languages and go over sample DSLs from different communities. An in-depth example will serve to grasp the power that lies in DSLs for efficiency, correctness and ease to target complex emerging systems.
Bibtex
@Misc{castrillon_ensi20,
author = {Castrillon, Jeronimo},
title = {The role of domain-specific languages for cyber-physical systems},
year = {2020},
howpublished = {Seminar series: Design and Programming Cyber-Physical Systems and IoT Applications (invited talk)},
location = {Tunis, Tunisia (Virtual)},
month = oct,
abstract = {Embedded and cyber-physical systems (CPS) are heterogeneous interconnected computing systems with an ever increasing complexity. As CPSs become more widespread, the developer community widens, exposing the complexity to mainstream programmers. In this talk, I will talk about domain-specific languages (DSLs) as a promising avenue to handle complexity without compromising on efficiency. The talk will provide background on programming languages and go over sample DSLs from different communities. An in-depth example will serve to grasp the power that lies in DSLs for efficiency, correctness and ease to target complex emerging systems. },
url = {https://sites.google.com/ensi-uma.tn/seminar-series-on-cps-n-iot/home}
}Downloads
201006_castrill_ENSI-compressed [PDF]
Permalink
- Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories" , In ACM Transactions on Embedded Computing Systems (TECS), Association for Computing Machinery, vol. 19, no. 6, New York, NY, USA, Sep 2020. [doi] [Bibtex & Downloads]
Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories
Reference
Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories" , In ACM Transactions on Embedded Computing Systems (TECS), Association for Computing Machinery, vol. 19, no. 6, New York, NY, USA, Sep 2020. [doi]
Abstract
Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip/off-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM) and DRAM-based off-chip memory. Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Optimizations for off-chip memory such as memory access order, data mapping and the choice of a suitable memory access granularity are employed to reduce the contention in the off-chip memory. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 32% and 73% respectively compared to an iso-capacity SRAM. The overall DRAM dynamic energy consumption improvements due to memory optimizations amount to 80%.
Bibtex
@Article{khan_tecs20,
author = {Asif Ali Khan and Norman A. Rink and Fazal Hameed and Jeronimo Castrillon},
title = {Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories},
journal = {ACM Transactions on Embedded Computing Systems (TECS)},
year = {2020},
month = sep,
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {19},
number = {6},
issn = {1539-9087},
url = {https://doi.org/10.1145/3396235},
doi = {10.1145/3396235},
articleno = {44},
numpages = {26},
abstract = {Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip/off-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM) and DRAM-based off-chip memory. Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Optimizations for off-chip memory such as memory access order, data mapping and the choice of a suitable memory access granularity are employed to reduce the contention in the off-chip memory. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 32\% and 73\% respectively compared to an iso-capacity SRAM. The overall DRAM dynamic energy consumption improvements due to memory optimizations amount to 80\%.},
}Downloads
2009_Khan_TECS [PDF]
Related Paths
Permalink
- Alexander Brauckmann, Andrés Goens, Jeronimo Castrillon, "ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers" , In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-4, Sep 2020. [doi] [Bibtex & Downloads]
ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers
Reference
Alexander Brauckmann, Andrés Goens, Jeronimo Castrillon, "ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers" , In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-4, Sep 2020. [doi]
Abstract
Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.
Bibtex
@InProceedings{brauckmann_fdl20,
author = {Alexander Brauckmann and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers},
booktitle = {2020 Forum for Specification and Design Languages (FDL)},
year = {2020},
location = {Kiel, Germany},
month = sep,
pages={1-4},
doi={10.1109/FDL50818.2020.9232946},
url = {https://ieeexplore.ieee.org/document/9232946},
abstract = {Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.},
}Downloads
2009_Brauckmann_FDL [PDF]
Permalink
- Marten Lohstroh, Christian Menard, Alexander Schulz-Rosengarten, Matthew Weber, Jeronimo Castrillon, Edward A. Lee, "A Language for Deterministic Coordination Across Multiple Timelines" , In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-8, Sep 2020. (Best paper award candidate) [doi] [Bibtex & Downloads]
A Language for Deterministic Coordination Across Multiple Timelines
Reference
Marten Lohstroh, Christian Menard, Alexander Schulz-Rosengarten, Matthew Weber, Jeronimo Castrillon, Edward A. Lee, "A Language for Deterministic Coordination Across Multiple Timelines" , In Proceeding: 2020 Forum for Specification and Design Languages (FDL), pp. 1-8, Sep 2020. (Best paper award candidate) [doi]
Abstract
We discuss a novel approach for constructing deterministic reactive systems that evolves around a temporal model which incorporates a multiplicity of timelines. This model is central to LINGUA FRANCA (LF), a polyglot coordination language and compiler toolchain we are developing for the definition and composition of concurrent components called Reactors, which are objects that react to and emit discrete events. What sets LF apart from other languages that treat time as a first-class citizen is that it confronts the issue that in any reactive system there are at least two distinct timelines involved; a logical one and a physical one-and possibly multiple of each kind. LF provides a mechanism for relating events across timelines, and guarantees deterministic program behavior under quantifiable assumptions.
Bibtex
@InProceedings{lohstroh_fdl20,
author = {Marten Lohstroh and Christian Menard and Alexander Schulz-Rosengarten and Matthew Weber and Jeronimo Castrillon and Edward A. Lee},
title = {A Language for Deterministic Coordination Across Multiple Timelines},
booktitle = {2020 Forum for Specification and Design Languages (FDL)},
year = {2020},
location = {Kiel, Germany},
month = sep,
abstract = {We discuss a novel approach for constructing deterministic reactive systems that evolves around a temporal model which incorporates a multiplicity of timelines. This model is central to LINGUA FRANCA (LF), a polyglot coordination language and compiler toolchain we are developing for the definition and composition of concurrent components called Reactors, which are objects that react to and emit discrete events. What sets LF apart from other languages that treat time as a first-class citizen is that it confronts the issue that in any reactive system there are at least two distinct timelines involved; a logical one and a physical one-and possibly multiple of each kind. LF provides a mechanism for relating events across timelines, and guarantees deterministic program behavior under quantifiable assumptions.},
pages={1-8},
doi={10.1109/FDL50818.2020.9232939},
url = {https://ieeexplore.ieee.org/document/9232939},
}Downloads
2009_Lohstroh_FDL [PDF]
Permalink
- Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris and
Sinclair Shingarov, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, Zhengrong Wang, Norbert Wehn, Christian
and Wood Weis, Hongil Yoon, Éder F. Zulian, "The gem5 Simulator: Version 20.0+" , In arXiv preprint arXiv:2007.03152, Jul 2020. [Bibtex & Downloads]
The gem5 Simulator: Version 20.0+
Reference
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris and Sinclair Shingarov, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, Zhengrong Wang, Norbert Wehn, Christian and Wood Weis, Hongil Yoon, Éder F. Zulian, "The gem5 Simulator: Version 20.0+" , In arXiv preprint arXiv:2007.03152, Jul 2020.
Abstract
The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.
Bibtex
@article{lowe-power_gem5_2020,
author = {Lowe-Power, Jason and Ahmad, Abdul Mutaal and Akram, Ayaz and Alian, Mohammad and Amslinger, Rico and Andreozzi, Matteo and Armejach, Adri{\`a} and Asmussen, Nils and Bharadwaj, Srikant and Black, Gabe and Bloom, Gedare and Bruce, Bobby R. and Rodrigues Carvalho, Daniel and Jeronimo Castrillon and Chen, Lizhong and Derumigny, Nicolas and Diestelhorst, Stephan and Elsasser, Wendy and Fariborz, Marjan and Farmahini-Farahani, Amin and Fotouhi, Pouya and Gambord, Ryan and Gandhi, Jayneel and Gope, Dibakar and Grass, Thomas and Hanindhito, Bagus and Hansson, Andreas and Haria, Swapnil and Harris, Austin and Hayes, Timothy and Herrera, Adrian and Horsnell, Matthew and Jafri, Syed Ali Raza and Jagtap, Radhika and Jang, Hanhwi and Jeyapaul, Reiley and Jones, Timothy M. and Jung, Matthias and Kannoth, Subash and Khaleghzadeh, Hamidreza and Kodama, Yuetsu and Krishna, Tushar and Marinelli, Tommaso and Christian Menard and Mondelli, Andrea and M{\"u}ck, Tiago and Naji, Omar and Nathella, Krishnendra and Nguyen, Hoa and Nikoleris, Nikos and Olson, Lena E. and Orr, Marc and Pham, Binh and Prieto, Pablo and Reddy, Trivikram and Roelke, Alec and Samani, Mahyar and Sandberg, Andreas and Setoain, Javier and Shingarov, Boris and
Sinclair, Matthew D. and Ta, Tuan and Thakur, Rahul and Travaglini, Giacomo and Upton, Michael and Vaish, Nilay and Vougioukas, Ilias and Wang, Zhengrong and Wehn, Norbert and Weis, Christian
and Wood, David A. and Yoon, Hongil and Zulian, {\'E}der F.},
title = {The gem5 Simulator: Version 20.0+},
journal = {arXiv preprint arXiv:2007.03152},
url = {https://arxiv.org/abs/2007.03152},
year = {2020},
month = jul,
abstract = {The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.},
}Downloads
2007_Lowe-Power-Gem5 [PDF]
Permalink
- Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 822–827, Mar 2020. (Best paper award candidate A-Track, Video Presentation) [doi] [Bibtex & Downloads]
Achieving Determinism in Adaptive AUTOSAR
Reference
Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 822–827, Mar 2020. (Best paper award candidate A-Track, Video Presentation) [doi]
Abstract
AUTOSAR AP is an emerging industry standard that tackles the challenges of modern automotive software design, but does not provide adequate mechanisms to enforce deterministic execution. This poses profound challenges to testing and maintenance of the application software, which is particularly problematic for safety-critical applications. In this paper, we analyze the problem of nondeterminism in AP and propose a framework for the design of deterministic automotive software that transparently integrates with the AP communication mechanisms. We illustrate our approach in a case study based on the brake assistant demonstrator application that is provided by the AUTOSAR consortium. We show that the original implementation is nondeterministic and discuss a deterministic solution based on our framework.
Bibtex
@InProceedings{menard_date20,
author = {Christian Menard and Andr{\'e}s Goens and Marten Lohstroh and Jeronimo Castrillon},
title = {Achieving Determinism in Adaptive AUTOSAR},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
month = mar,
publisher = {IEEE},
location = {Grenoble, France},
abstract = {AUTOSAR AP is an emerging industry standard that tackles the challenges of modern automotive software design, but does not provide adequate mechanisms to enforce deterministic execution. This poses profound challenges to testing and maintenance of the application software, which is particularly problematic for safety-critical applications. In this paper, we analyze the problem of nondeterminism in AP and propose a framework for the design of deterministic automotive software that transparently integrates with the AP communication mechanisms. We illustrate our approach in a case study based on the brake assistant demonstrator application that is provided by the AUTOSAR consortium. We show that the original implementation is nondeterministic and discuss a deterministic solution based on our framework.},
isbn = {978-3-9819263-4-7},
pages = {822--827},
doi = {10.23919/DATE48585.2020.9116430},
url = {https://ieeexplore.ieee.org/abstract/document/9116430},
}Downloads
2003_Menard_DATE [PDF]
Related Paths
Permalink
- Robert Khasanov, Jeronimo Castrillon, "Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 909–914, Mar 2020. (Best paper award candidate E-Track, Video Presentation) [doi] [Bibtex & Downloads]
Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping
Reference
Robert Khasanov, Jeronimo Castrillon, "Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 909–914, Mar 2020. (Best paper award candidate E-Track, Video Presentation) [doi]
Abstract
Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the amount of available resources. This has considerably increased the complexity of analysis and optimization algorithms for runtime mapping of firm real-time applications. To reduce the runtime overhead, researchers have proposed to pre-compute partial mappings at compile time and have the runtime efficiently compute the final mapping. However, most existing solutions only compute a fixed mapping for a given set of running applications, and the mapping is defined for the entire duration of the workload execution. In this work we allow applications to adapt to the amount of available resources by using mapping segments. This way, applications may switch between different configurations with varied degree of parallelism. We present a runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations. The proposed algorithm outperforms the state-of-the-art approaches on the overall energy consumption by up to 13% while incurring an order of magnitude less scheduling overhead.
Bibtex
@InProceedings{khasanov_date20,
author = {Robert Khasanov and Jeronimo Castrillon},
title = {Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
month = mar,
publisher = {IEEE},
location = {Grenoble, France},
isbn = {978-3-9819263-4-7},
pages = {909--914},
doi = {10.23919/DATE48585.2020.9116381},
url = {https://ieeexplore.ieee.org/document/9116381},
abstract = {Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the amount of available resources. This has considerably increased the complexity of analysis and optimization algorithms for runtime mapping of firm real-time applications. To reduce the runtime overhead, researchers have proposed to pre-compute partial mappings at compile time and have the runtime efficiently compute the final mapping. However, most existing solutions only compute a fixed mapping for a given set of running applications, and the mapping is defined for the entire duration of the workload execution. In this work we allow applications to adapt to the amount of available resources by using mapping segments. This way, applications may switch between different configurations with varied degree of parallelism. We present a runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations. The proposed algorithm outperforms the state-of-the-art approaches on the overall energy consumption by up to 13% while incurring an order of magnitude less scheduling overhead.},
}Downloads
2003_Khasanov_DATE [PDF]
Related Paths
Permalink
- Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1502–1507, Mar 2020. (Video Presentation) [doi] [Bibtex & Downloads]
Generalized Data Placement Strategies for Racetrack Memories
Reference
Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories" , Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1502–1507, Mar 2020. (Video Presentation) [doi]
Abstract
Ultra-dense non-volatile racetrack memories (RTMs) have been investigated at various levels in the memory hierarchy for improved performance and reduced energy consumption. However, the innate shift operations in RTMs hinder their applicability to replace low-latency on-chip memories. Recent research has demonstrated that intelligent placement of memory objects in RTMs can significantly reduce the amount of shifts with no hardware overhead, albeit for specific system setups. However, existing placement strategies may lead to sub-optimal performance when applied to different architectures. In this paper we look at generalized data placement mechanisms that improve upon existing ones by taking into account the underlying memory architecture and the timing and liveliness information of memory objects. We propose a novel heuristic and a formulation using genetic algorithms that optimize key performance parameters. We show that, on average, our generalized approach improves the number of shifts, performance and energy consumption by 4.3x, 46% and 55% respectively compared to the state-of-the-art.
Bibtex
@InProceedings{khan_date20,
author = {Asif Ali Khan and Andr{\'e}s Goens and Fazal Hameed and Jeronimo Castrillon},
title = {Generalized Data Placement Strategies for Racetrack Memories},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
publisher = {IEEE},
location = {Grenoble, France},
month = mar,
isbn = {978-3-9819263-4-7},
pages = {1502--1507},
doi = {10.23919/DATE48585.2020.9116245},
url = {https://ieeexplore.ieee.org/document/9116245},
abstract = {Ultra-dense non-volatile racetrack memories (RTMs) have been investigated at various levels in the memory hierarchy for improved performance and reduced energy consumption. However, the innate shift operations in RTMs hinder their applicability to replace low-latency on-chip memories. Recent research has demonstrated that intelligent placement of memory objects in RTMs can significantly reduce the amount of shifts with no hardware overhead, albeit for specific system setups. However, existing placement strategies may lead to sub-optimal performance when applied to different architectures. In this paper we look at generalized data placement mechanisms that improve upon existing ones by taking into account the underlying memory architecture and the timing and liveliness information of memory objects. We propose a novel heuristic and a formulation using genetic algorithms that optimize key performance parameters. We show that, on average, our generalized approach improves the number of shifts, performance and energy consumption by 4.3x, 46% and 55% respectively compared to the state-of-the-art.},
}Downloads
2003_Khan_DATE [PDF]
Related Paths
Permalink
- Robin Bläsing, Asif Ali Khan, Panagiotis Ch. Filippou, Chirag Garg, Fazal Hameed, Jeronimo Castrillon, Stuart S. P. Parkin, "Magnetic Racetrack Memory: From Physics to the Cusp of Applications within a Decade" , In Proceedings of the IEEE, vol. 108, no. 8, pp. 1303-1321, Mar 2020. [doi] [Bibtex & Downloads]
Magnetic Racetrack Memory: From Physics to the Cusp of Applications within a Decade
Reference
Robin Bläsing, Asif Ali Khan, Panagiotis Ch. Filippou, Chirag Garg, Fazal Hameed, Jeronimo Castrillon, Stuart S. P. Parkin, "Magnetic Racetrack Memory: From Physics to the Cusp of Applications within a Decade" , In Proceedings of the IEEE, vol. 108, no. 8, pp. 1303-1321, Mar 2020. [doi]
Bibtex
@Article{khan_pieee20,
author = {Robin Bl{\"a}sing and Asif Ali Khan and Panagiotis Ch. Filippou and Chirag Garg and Fazal Hameed and Jeronimo Castrillon and Stuart S. P. Parkin},
title = {Magnetic Racetrack Memory: From Physics to the Cusp of Applications within a Decade},
journal = {Proceedings of the IEEE},
year = {2020},
month = mar,
volume={108},
number={8},
pages={1303-1321},
doi = {10.1109/JPROC.2020.2975719},
url = {https://ieeexplore.ieee.org/document/9045991},
}Downloads
2003_Khan_JPROC [PDF]
Related Paths
Permalink
- Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems" , Cyber Physical Systems. Model-Based Design – Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019) (Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid) , Springer International Publishing, pp. 59–85, Cham, Feb 2020. [doi] [Bibtex & Downloads]
Reactors: A Deterministic Model for Composable Reactive Systems
Reference
Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems" , Cyber Physical Systems. Model-Based Design – Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019) (Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid) , Springer International Publishing, pp. 59–85, Cham, Feb 2020. [doi]
Abstract
This paper describes a component-based concurrent model of computation for reactive systems. The components in this model, featuring ports and hierarchy, are called reactors. The model leverages a semantic notion of time, an event scheduler, and a synchronous-reactive style of communication to achieve determinism. Reactors enable a programming model that ensures determinism, unless explicitly abandoned by the programmer. We show how the coordination of reactors can safely and transparently exploit parallelism, both in shared-memory and distributed systems.
Bibtex
@InProceedings{Lohstroh_cyphy19,
author = {Marten Lohstroh and {\'I}{\~n}igo {\'I}ncer Romero and Andr\'{e}s Goens and Patricia Derler and Jeronimo Castrillon and Edward A. Lee and Alberto Sangiovanni-Vincentelli},
title = {Reactors: A Deterministic Model for Composable Reactive Systems},
editor= {Chamberlain, Roger and Edin Grimheden, Martin and Taha, Walid},
booktitle={Cyber Physical Systems. Model-Based Design -- Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019)},
year = {2020},
location = {New York City, NY, USA},
month = feb,
publisher={Springer International Publishing},
address={Cham},
pages={59--85},
abstract={This paper describes a component-based concurrent model of computation for reactive systems. The components in this model, featuring ports and hierarchy, are called reactors. The model leverages a semantic notion of time, an event scheduler, and a synchronous-reactive style of communication to achieve determinism. Reactors enable a programming model that ensures determinism, unless explicitly abandoned by the programmer. We show how the coordination of reactors can safely and transparently exploit parallelism, both in shared-memory and distributed systems.},
isbn={978-3-030-41131-2},
url = {https://link.springer.com/chapter/10.1007/978-3-030-41131-2_4},
doi = {10.1007/978-3-030-41131-2_4},
numpages = {27pp},
}Downloads
1910_Lohstroh_CyPhy [PDF]
Related Paths
Permalink
- Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon, "Compiler-Based Graph Representations for Deep Learning Models of Code" , Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020), Association for Computing Machinery, pp. 201–211, New York, NY, USA, Feb 2020. [doi] [Bibtex & Downloads]
Compiler-Based Graph Representations for Deep Learning Models of Code
Reference
Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon, "Compiler-Based Graph Representations for Deep Learning Models of Code" , Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020), Association for Computing Machinery, pp. 201–211, New York, NY, USA, Feb 2020. [doi]
Bibtex
@InProceedings{brauckmann_cc20,
author = {Alexander Brauckmann and Andr\'{e}s Goens and Sebastian Ertel and Jeronimo Castrillon},
title = {Compiler-Based Graph Representations for Deep Learning Models of Code},
booktitle = {Proceedings of the 29th ACM SIGPLAN International Conference on Compiler Construction (CC 2020)},
year = {2020},
isbn = {9781450371209},
url = {https://doi.org/10.1145/3377555.3377894},
doi = {10.1145/3377555.3377894},
series = {CC 2020},
pages = {201–211},
numpages = {11},
publisher = {Association for Computing Machinery},
location = {San Diego, CA, USA},
month = feb,
address = {New York, NY, USA},
keywords = {conf},
}Downloads
2002_Brauckmann_CC [PDF]
Permalink
2019
- Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart S. P. Parkin, Jeronimo Castrillon, "ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0" , In ACM Transactions on Architecture and Code Optimization (TACO), ACM, vol. 16, no. 4, pp. 56:1–56:23, New York, NY, USA, Dec 2019. [doi] [Bibtex & Downloads]
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0
Reference
Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart S. P. Parkin, Jeronimo Castrillon, "ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0" , In ACM Transactions on Architecture and Code Optimization (TACO), ACM, vol. 16, no. 4, pp. 56:1–56:23, New York, NY, USA, Dec 2019. [doi]
Abstract
Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This paper presents data placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%.
Bibtex
@Article{khan_taco19,
author = {Asif Ali Khan and Fazal Hameed and Robin Bl{\"a}sing and Stuart S. P. Parkin and Jeronimo Castrillon},
title = {ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0},
journal = {ACM Transactions on Architecture and Code Optimization (TACO)},
issue_date = {December 2019},
volume = {16},
number = {4},
month = dec,
year = {2019},
issn = {1544-3566},
pages = {56:1--56:23},
articleno = {56},
numpages = {23},
url = {http://doi.acm.org/10.1145/3372489},
doi = {10.1145/3372489},
acmid = {3372489},
publisher = {ACM},
address = {New York, NY, USA},
abstract = {Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This paper presents data placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5\%, outperforming the state of the art by up to 16.1\%.},
}Downloads
1912_Khan_TACO [PDF]
Related Paths
Permalink
- Fazal Hameed, Jeronimo Castrillon, "A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 27, no. 10, pp. 2375-2386, Oct 2019. [doi] [Bibtex & Downloads]
A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement
Reference
Fazal Hameed, Jeronimo Castrillon, "A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 27, no. 10, pp. 2375-2386, Oct 2019. [doi]
Abstract
High capacity L4 architectures as Last-Level-Cache (LLC) have been recently introduced between L3-SRAM and off-chip memory. These LLC architectures have either employed DRAM or Spin-Transfer-Torque (STT-RAM) memory technologies. It is a known fact that DRAM LLCs feature a higher energy consumption while STT-RAM LLCs feature a lower write endurance compared to their counterparts. This paper proposes an efficient hybrid DRAM/STT-RAM LLC architecture that exploits the best characteristics offered by the individual memory technologies while mitigating their drawbacks. More precisely, we introduce a novel mechanism for the storage and management of the hybrid LLC tags, and a proactive L3-SRAM writeback policy that combines multiple dirty blocks that are mapped to the same LLC row. Our hybrid architecture reduces LLC interference by having less writeback accesses and row fetches. The endurance is improved by reducing the number of STT-RAM block writes. We show that our LLC architecture reduces the total number of STT-RAM block writes by 78% and improves the average performance by 13% compared to a recently proposed STT- RAM LLC. Compared to the state-of-the-art DRAM LLC, we report an average energy and performance improvement of 24% and 17.1% respectively.
Bibtex
@Article{hameed_tvlsi19,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {A Novel Hybrid {DRAM}/{STT-RAM} {L}ast-{L}evel-{C}ache Architecture for Performance, Energy and Endurance Enhancement},
journal = {IEEE Transactions on Very Large Scale Integration Systems (TVLSI)},
year = {2019},
month = oct,
abstract = {High capacity L4 architectures as Last-Level-Cache (LLC) have been recently introduced between L3-SRAM and off-chip memory. These LLC architectures have either employed DRAM or Spin-Transfer-Torque (STT-RAM) memory technologies. It is a known fact that DRAM LLCs feature a higher energy consumption while STT-RAM LLCs feature a lower write endurance compared to their counterparts. This paper proposes an efficient hybrid DRAM/STT-RAM LLC architecture that exploits the best characteristics offered by the individual memory technologies while mitigating their drawbacks. More precisely, we introduce a novel mechanism for the storage and management of the hybrid LLC tags, and a proactive L3-SRAM writeback policy that combines multiple dirty blocks that are mapped to the same LLC row. Our hybrid architecture reduces LLC interference by having less writeback accesses and row fetches. The endurance is improved by reducing the number of STT-RAM block writes. We show that our LLC architecture reduces the total number of STT-RAM block writes by 78\% and improves the average performance by 13\% compared to a recently proposed STT- RAM LLC. Compared to the state-of-the-art DRAM LLC, we report an average energy and performance improvement of 24\% and 17.1\% respectively.},
volume = {27},
number = {10},
pages = {2375-2386},
numpages = {12pp},
doi={10.1109/TVLSI.2019.2918385},
url = {https://ieeexplore.ieee.org/document/8734763},
}Downloads
1905_Hameed_TVLSI [PDF]
Related Paths
Permalink
- Lars Schütze, Jeronimo Castrillon, "Efficient Late Binding of Dynamic Function Compositions" , Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, ACM, pp. 141–151, New York, NY, USA, Oct 2019. [doi] [Bibtex & Downloads]
Efficient Late Binding of Dynamic Function Compositions
Reference
Lars Schütze, Jeronimo Castrillon, "Efficient Late Binding of Dynamic Function Compositions" , Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, ACM, pp. 141–151, New York, NY, USA, Oct 2019. [doi]
Bibtex
@InProceedings{schuetze_sle19,
author = {Lars Sch{\"u}tze and Jeronimo Castrillon},
title = {Efficient Late Binding of Dynamic Function Compositions},
booktitle = {Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering},
year = {2019},
series = {SLE 2019},
address = {New York, NY, USA},
month = oct,
publisher = {ACM},
keywords = {conf},
location = {Athens, Greece},
isbn = {978-1-4503-6981-7},
pages = {141--151},
numpages = {11},
url = {http://doi.acm.org/10.1145/3357766.3359543},
doi = {10.1145/3357766.3359543},
acmid = {3359543},
}Downloads
1910_Schuetze_SLE [PDF]
Permalink
- Tobias Reiher, Alexander Senier, Jeronimo Castrillon, Thorsten Strufe, "RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers" , In Proceeding: International Conference on Formal Aspects of Component Software (Arbab, Farhad and Jongmans, Sung-Shik) , Springer International Publishing, pp. 170–190, Cham, Oct 2019. [doi] [Bibtex & Downloads]
RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers
Reference
Tobias Reiher, Alexander Senier, Jeronimo Castrillon, Thorsten Strufe, "RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers" , In Proceeding: International Conference on Formal Aspects of Component Software (Arbab, Farhad and Jongmans, Sung-Shik) , Springer International Publishing, pp. 170–190, Cham, Oct 2019. [doi]
Abstract
Various vulnerabilities have been found in message parsers of protocol implementations in the past. Even highly sensitive software components like TLS libraries are affected regularly. Resulting issues range from denial-of-service attacks to the extraction of sensitive information. The complexity of protocols and imprecise specifications in natural language are the core reasons for subtle bugs in implementations, which are hard to find. The lack of precise specifications impedes formal verification.
Bibtex
@InProceedings{reiher_facs19,
author = {Tobias Reiher and Alexander Senier and Jeronimo Castrillon and Thorsten Strufe},
title = {RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers},
booktitle = {International Conference on Formal Aspects of Component Software},
year = {2019},
editor = {Arbab, Farhad and Jongmans, Sung-Shik},
organization = {Springer},
publisher = {Springer International Publishing},
location = {Amsterdam, The Netherlands},
address = {Cham},
month = oct,
pages = {170--190},
numpages = {21},
abstract = {Various vulnerabilities have been found in message parsers of protocol implementations in the past. Even highly sensitive software components like TLS libraries are affected regularly. Resulting issues range from denial-of-service attacks to the extraction of sensitive information. The complexity of protocols and imprecise specifications in natural language are the core reasons for subtle bugs in implementations, which are hard to find. The lack of precise specifications impedes formal verification.},
isbn={978-3-030-40914-2},
doi = {10.1007/978-3-030-40914-2_9},
url = {https://link.springer.com/chapter/10.1007/978-3-030-40914-2_9},
}Downloads
1910_Reiher_FACS [PDF]
Permalink
- Jeronimo Castrillon, "Embedded manycore programming: From auto-parallelization to domain specific languages" , In IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote), Oct 2019. [Bibtex & Downloads]
Embedded manycore programming: From auto-parallelization to domain specific languages
Reference
Jeronimo Castrillon, "Embedded manycore programming: From auto-parallelization to domain specific languages" , In IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote), Oct 2019.
Abstract
Programming manycores remains a daunting task, especially in the presence of the heterogeneity and application constraints typical in the embedded domain. This talk reviews efforts to cope with this complexity from the last 10+ years of research. It starts with the challenges faced by auto-parallelizing compilers, discussing how far they have made it since the start of the multi-core era. The talk also reviews explicit parallel programming and associated programming methodologies, with focus on recent advances that aim at increasing the adaptivity and robustness of dataflow applications. The talk then advocates for even higher-level programming abstractions in the form of domain specific languages, particularly important to deal with the increased complexity brought by emerging computing paradigms.
Bibtex
@Misc{castrillon_mcsoc2019,
author = {Castrillon, Jeronimo},
title = {Embedded manycore programming: From auto-parallelization to domain specific languages},
howpublished = {IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote)},
month = oct,
year = {2019},
abstract = {Programming manycores remains a daunting task, especially in the presence of the heterogeneity and application constraints typical in the embedded domain. This talk reviews efforts to cope with this complexity from the last 10+ years of research. It starts with the challenges faced by auto-parallelizing compilers, discussing how far they have made it since the start of the multi-core era. The talk also reviews explicit parallel programming and associated programming methodologies, with focus on recent advances that aim at increasing the adaptivity and robustness of dataflow applications. The talk then advocates for even higher-level programming abstractions in the form of domain specific languages, particularly important to deal with the increased complexity brought by emerging computing paradigms.},
url = {http://mcsoc-forum.org/m2019/wp-content/uploads/2019/10/191002_castrillon_mcsoc-opt.pdf},
keywords = {invitedtalk},
location = {Singapore},
}Downloads
191002_castrillon_mcsoc-opt [PDF]
Permalink
- Jeronimo Castrillon, "Dataflow and higher level abstractions for parallel programming" , In CPS Summer School 2019: Designing Cyber-Physical Systems - From concepts to implementation (keynote), Sep 2019. [Bibtex & Downloads]
Dataflow and higher level abstractions for parallel programming
Reference
Jeronimo Castrillon, "Dataflow and higher level abstractions for parallel programming" , In CPS Summer School 2019: Designing Cyber-Physical Systems - From concepts to implementation (keynote), Sep 2019.
Abstract
Computing systems continue to increase in complexity, today including multiple cores, complex memory hierarchies and domain-specific accelerators, and soon with components built with emerging hardware technologies. This complexity calls for advances in a variety of domains, like programming and modeling languages, models of hardware, system simulators, design exploration methodologies and hardware architectures. From the standpoint of programming languages and compilers, this lecture discusses the challenges in mainstream sequential programming to motivate higher-level abstractions. It then provides an introduction to dataflow programming methodologies as a promising solution for embedded applications. We will review the fundamentals of dataflow models of computation, basic programming methodologies and look at current research to account for the adaptivity that new applications require, especially in the context of cyber physical systems. The lecture closes with an outlook on higher level programming abstractions and challenges posed by emerging computing architectures.
Bibtex
@Misc{castrillon_cpss19,
author = {Castrillon, Jeronimo},
title = {Dataflow and higher level abstractions for parallel programming},
howpublished = CPS Summer School 2019: {Designing Cyber-Physical Systems - From concepts to implementation (keynote)},
month = sep,
year = {2019},
abstract = {Computing systems continue to increase in complexity, today including multiple cores, complex memory hierarchies and domain-specific accelerators, and soon with components built with emerging hardware technologies. This complexity calls for advances in a variety of domains, like programming and modeling languages, models of hardware, system simulators, design exploration methodologies and hardware architectures. From the standpoint of programming languages and compilers, this lecture discusses the challenges in mainstream sequential programming to motivate higher-level abstractions. It then provides an introduction to dataflow programming methodologies as a promising solution for embedded applications. We will review the fundamentals of dataflow models of computation, basic programming methodologies and look at current research to account for the adaptivity that new applications require, especially in the context of cyber physical systems. The lecture closes with an outlook on higher level programming abstractions and challenges posed by emerging computing architectures.},
keywords = {invitedtalk},
location = {Alghero, Sardinia, Italy},
project = {cfaed, haec},
url = {http://www.cpsschool.eu/dataflow-and-higher-level-abstractions-for-parallel-programming/}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi] [Bibtex & Downloads]
STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism
Reference
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi]
Abstract
Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.Bibtex
@InProceedings{ertel_haskell19,
author = {Ertel, Sebastian and Adam, Justus and Rink, Norman A. and Goens, Andr{\'e}s and Castrillon, Jeronimo},
title = {{STCLang}: State Thread Composition as a Foundation for Monadic Dataflow Parallelism},
booktitle = {Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell},
year = {2019},
series = {Haskell 2019},
pages = {146--161},
address = {New York, NY, USA},
month = aug,
publisher = {ACM},
abstract = {Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.},
acmid = {3342600},
doi = {10.1145/3331545.3342600},
isbn = {978-1-4503-6813-1},
keywords = {conf},
location = {Berlin, Germany},
numpages = {16},
url = {http://doi.acm.org/10.1145/3331545.3342600}
}Downloads
1908_Ertel_Haskell [PDF]
Related Paths
Permalink
- Joonas Multanen, Asif Ali Khan, Pekka Jääskeläinen, Fazal Hameed, Jeronimo Castrillon, "SHRIMP: Efficient Instruction Delivery with Domain Wall Memory" , Proceedings of the International Symposium on Low Power Electronics and Design, ACM, pp. 6pp, New York, NY, USA, Jul 2019. [doi] [Bibtex & Downloads]
SHRIMP: Efficient Instruction Delivery with Domain Wall Memory
Reference
Joonas Multanen, Asif Ali Khan, Pekka Jääskeläinen, Fazal Hameed, Jeronimo Castrillon, "SHRIMP: Efficient Instruction Delivery with Domain Wall Memory" , Proceedings of the International Symposium on Low Power Electronics and Design, ACM, pp. 6pp, New York, NY, USA, Jul 2019. [doi]
Bibtex
@InProceedings{multanen_islped19,
author = {Joonas Multanen and Asif Ali Khan and Pekka J{\"a}{\"a}skel{\"a}inen and Fazal Hameed and Jeronimo Castrillon},
title = {{SHRIMP}: Efficient Instruction Delivery with Domain Wall Memory},
booktitle = {Proceedings of the International Symposium on Low Power Electronics and Design},
year = {2019},
month = jul,
series = {ISLPED '19},
location = {Lausanne, Switzerland},
pages = {6pp},
numpages = {6},
publisher = {ACM},
address = {New York, NY, USA},
doi={10.1109/ISLPED.2019.8824954},
url = {https://ieeexplore.ieee.org/document/8824954},
}Downloads
1907_Multanen_ISLPED [PDF]
Related Paths
Permalink
- Andrés Goens, Christian Menard, Jeronimo Castrillon, "On Compact Mappings for Multicore Systems" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung) , Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi] [Bibtex & Downloads]
On Compact Mappings for Multicore Systems
Reference
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On Compact Mappings for Multicore Systems" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung) , Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi]
Bibtex
@InProceedings{goens_samos19,
author = {Andr{\'e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On Compact Mappings for Multicore Systems},
booktitle = {Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS)},
year = {2019},
editor = {D. Pnevmatikatos and M. Pelcat and M. Jung},
volume = {11733},
pages = {325--335},
month = jul,
organization = {IEEE},
publisher = {Springer, Cham},
doi = {10.1007/978-3-030-27562-4_23},
isbn = {978-3-030-27561-7},
location = {Pythagorion, Greece},
numpages = {11},
url = {https://link.springer.com/chapter/10.1007/978-3-030-27562-4_23}
}Downloads
1907_Goens_SAMOS [PDF]
Related Paths
Permalink
- Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads" , Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES), ACM, pp. 5–18, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads
Reference
Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads" , Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES), ACM, pp. 5–18, New York, NY, USA, Jun 2019. [doi]
Abstract
Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.
Bibtex
@InProceedings{kahn_lctes19,
author = {Asif Ali Khan and Norman A. Rink and Fazal Hameed and Jeronimo Castrillon},
title = {Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads},
booktitle = {Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES)},
series = {LCTES 2019},
pages = {5--18},
numpages = {12},
numpages = {14},
isbn = {978-1-4503-6724-0/19/06},
doi = {10.1145/3316482.3326351},
url = {http://doi.acm.org/10.1145/3316482.3326351},
acmid = {3326351},
year = {2019},
month = jun,
location = {Phoenix, AZ, USA},
publisher = {ACM},
address = {New York, NY, USA},
abstract = {Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.},
acmid = {3326351},
}Downloads
1906_Khan_LCTES [PDF]
Related Paths
Permalink
- Norman A. Rink, Jeronimo Castrillon, "TeIL: a type-safe imperative Tensor Intermediate Language" , Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY), ACM, pp. 57–68, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
TeIL: a type-safe imperative Tensor Intermediate Language
Reference
Norman A. Rink, Jeronimo Castrillon, "TeIL: a type-safe imperative Tensor Intermediate Language" , Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY), ACM, pp. 57–68, New York, NY, USA, Jun 2019. [doi]
Abstract
Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels.
The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a type-safety result in the Coq proof assistant.Bibtex
@InProceedings{rink_array19,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {{TeIL}: a type-safe imperative {Tensor Intermediate Language}},
booktitle = {Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY)},
year = {2019},
series = {ARRAY 2019},
pages = {57--68},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
doi = {10.1145/3315454.3329959},
url = {http://doi.acm.org/10.1145/3315454.3329959},
acmid = {3329959},
isbn = {978-1-4503-6717-2/19/06},
location = {Phoenix, AZ, USA},
numpages = {12},
abstract = {Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels.
The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a type-safety result in the Coq proof assistant.},
}Downloads
1906_Rink_Array [PDF]
Related Paths
Permalink
- Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A Case Study on Machine Learning for Synthesizing Benchmarks" , Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
A Case Study on Machine Learning for Synthesizing Benchmarks
Reference
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A Case Study on Machine Learning for Synthesizing Benchmarks" , Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi]
Abstract
Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.
Bibtex
@InProceedings{goens_mapl19,
author = {Andr\'{e}s Goens and Alexander Brauckmann and Sebastian Ertel and Chris Cummins and Hugh Leather and Jeronimo Castrillon},
title = {A Case Study on Machine Learning for Synthesizing Benchmarks},
booktitle = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL)},
year = {2019},
series = {MAPL 2019},
doi = {10.1145/3315508.3329976},
url = {http://doi.acm.org/10.1145/3315508.3329976},
acmid = {3329976},
isbn = {978-1-4503-6719-6/19/06},
pages = {38--46},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
keywords = {conf},
location = {Phoenix, AZ, USA},
numpages = {9},
abstract = {Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.},
}Downloads
1906_Goens_MAPL [PDF]
Permalink
- Jeronimo Castrillon, "SoC programming in the era of the Internet of Things, machine learning and emerging technologies" , In Groupement De Recherche SOC2: System On Chip, Systèmes embarqués et Objets Connecté (keynote), Jun 2019. [Bibtex & Downloads]
SoC programming in the era of the Internet of Things, machine learning and emerging technologies
Reference
Jeronimo Castrillon, "SoC programming in the era of the Internet of Things, machine learning and emerging technologies" , In Groupement De Recherche SOC2: System On Chip, Systèmes embarqués et Objets Connecté (keynote), Jun 2019.
Abstract
The design of a system on chip has traditionally been one of the most complex tasks in computing
systems. Designers have to deal with stringent application constraints under a low-power budget while
reducing non-recurring engineering costs. Modeling languages, costs models of hardware, system
simulators, design exploration methodologies and alike have made it possible to cope with this high
complexity. Today, three recent trends represent a non trivial complexity increase and thus a challenge
for SoC designers and programmers, namely, 1) additional system dynamics in the context of the
Internet of Things, 2) the ubiquity of machine learning workloads, and 3) the added complexity brought
by specialization and emerging technologies. This talk discusses how models and higher level
programming abstractions can be leveraged to cope with these trends. A dataflow programming
methodology is extended to account for dynamic execution scenarios at runtime. A tensor abstraction,
common in machine learning, is introduced that eases programming and design tasks. Finally, the
talk shows how the tensor abstraction is useful to efficiently map tensor computations to SoCs with
non-volatile racetrack scratch-pad memory.Bibtex
@Misc{castrillon_gdrsoc2019,
author = {Castrillon, Jeronimo},
title = {SoC programming in the era of the Internet of Things, machine learning and emerging technologies},
howpublished = {Groupement De Recherche SOC2: System On Chip, Syst{\`e}mes embarqu{\'e}s et Objets Connect{\'e} (keynote)},
month = jun,
year = {2019},
abstract = {The design of a system on chip has traditionally been one of the most complex tasks in computing
systems. Designers have to deal with stringent application constraints under a low-power budget while
reducing non-recurring engineering costs. Modeling languages, costs models of hardware, system
simulators, design exploration methodologies and alike have made it possible to cope with this high
complexity. Today, three recent trends represent a non trivial complexity increase and thus a challenge
for SoC designers and programmers, namely, 1) additional system dynamics in the context of the
Internet of Things, 2) the ubiquity of machine learning workloads, and 3) the added complexity brought
by specialization and emerging technologies. This talk discusses how models and higher level
programming abstractions can be leveraged to cope with these trends. A dataflow programming
methodology is extended to account for dynamic execution scenarios at runtime. A tensor abstraction,
common in machine learning, is introduced that eases programming and design tasks. Finally, the
talk shows how the tensor abstraction is useful to efficiently map tensor computations to SoCs with
non-volatile racetrack scratch-pad memory.},
location = {Montpellier, France},
url = {http://www.gdr-soc.cnrs.fr/programme-colloque-2019/}
}Downloads
190620_castrillon_gdrsoc2-lowres [PDF]
Permalink
- Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''" , In CoRR, vol. abs/1906.12098, Jun 2019. [Bibtex & Downloads]
Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''
Reference
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''" , In CoRR, vol. abs/1906.12098, Jun 2019.
Bibtex
@Article{ertel_haskellsup19,
author = {Sebastian Ertel and Justus Adam and Norman A. Rink and Andr{\'{e}}s Goens and Jeronimo Castrillon},
title = {Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''},
journal = {CoRR},
year = {2019},
volume = {abs/1906.12098},
month = jun,
archiveprefix = {arXiv},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1906-12098},
eprint = {1906.12098},
url = {http://arxiv.org/abs/1906.12098}
}Downloads
1906_Ertel_Haskellsupp [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Programming abstractions: When domain-specific goes mainstream" , In 28th Workshop of the Gesellschaft für Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS'19) (invited talk), Mar 2019. [Bibtex & Downloads]
Programming abstractions: When domain-specific goes mainstream
Reference
Jeronimo Castrillon, "Programming abstractions: When domain-specific goes mainstream" , In 28th Workshop of the Gesellschaft für Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS'19) (invited talk), Mar 2019.
Abstract
We have seen several inflection points in computing in this century: from single to multi-core,
from homogeneous to heterogeneous, and in the near future to fundamentally new computing
paradigms with emerging technologies. With domain-specific hardware becoming mainstream,
and programming reaching out to professions outside computer science, higher-level
programming abstractions, domain-specific languages (DSLs) and tools are badly needed. This
talk provides examples of such programming abstractions as a basis for discussion. It first
discusses how dataflow programming models from the embedded domain can be leveraged in
more general purpose setups. It then presents DSLs for particle-based simulations and for tensor
expressions. The latter is one example of multiple tensor DSLs available today, spawned
by the recent machine learning boom. The talk closes with examples of emerging technologies
and a brief discussion about how they may impact our current assumptions.Bibtex
@Misc{castrillon_pars2019,
author = {Castrillon, Jeronimo},
title = {Programming abstractions: When domain-specific goes mainstream},
howpublished = {28th Workshop of the Gesellschaft f{\"u}r Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS'19) (invited talk)},
month = mar,
year = {2019},
abstract = {We have seen several inflection points in computing in this century: from single to multi-core,
from homogeneous to heterogeneous, and in the near future to fundamentally new computing
paradigms with emerging technologies. With domain-specific hardware becoming mainstream,
and programming reaching out to professions outside computer science, higher-level
programming abstractions, domain-specific languages (DSLs) and tools are badly needed. This
talk provides examples of such programming abstractions as a basis for discussion. It first
discusses how dataflow programming models from the embedded domain can be leveraged in
more general purpose setups. It then presents DSLs for particle-based simulations and for tensor
expressions. The latter is one example of multiple tensor DSLs available today, spawned
by the recent machine learning boom. The talk closes with examples of emerging technologies
and a brief discussion about how they may impact our current assumptions.},
location = {Berlin, Germany},
url = {https://fg-pars.gi.de/veranstaltung/pars-workshop-2019/}
}Downloads
190321_castrill_PARS-sent [PDF]
Permalink
- Gerhard Fettweis, Meik Dörpinghaus, Jeronimo Castrillon, Akash Kumar, Christel Baier, Karlheinz Bock, Frank Ellinger, Andreas Fery, Frank H. P. Fitzek, Hermann Härtig, Kambiz Jamshidi, Thomas Kissinger, Wolfgang Lehner, Michael Mertig, Wolfgang E. Nagel, Giang T. Nguyen, Dirk Plettemeier, Michael Schröter, Thorsten Strufe, "Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing" , In Proceedings of the IEEE, vol. 107, no. 1, pp. 204–231, Jan 2019. [doi] [Bibtex & Downloads]
Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing
Reference
Gerhard Fettweis, Meik Dörpinghaus, Jeronimo Castrillon, Akash Kumar, Christel Baier, Karlheinz Bock, Frank Ellinger, Andreas Fery, Frank H. P. Fitzek, Hermann Härtig, Kambiz Jamshidi, Thomas Kissinger, Wolfgang Lehner, Michael Mertig, Wolfgang E. Nagel, Giang T. Nguyen, Dirk Plettemeier, Michael Schröter, Thorsten Strufe, "Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing" , In Proceedings of the IEEE, vol. 107, no. 1, pp. 204–231, Jan 2019. [doi]
Bibtex
@Article{fettweis_ieeeproc19,
author = {Gerhard Fettweis and Meik D{\"o}rpinghaus and Jeronimo Castrillon and Akash Kumar and Christel Baier and Karlheinz Bock and Frank Ellinger and Andreas Fery and Frank H. P. Fitzek and Hermann H{\"a}rtig and Kambiz Jamshidi and Thomas Kissinger and Wolfgang Lehner and Michael Mertig and Wolfgang E. Nagel and Giang T. Nguyen and Dirk Plettemeier and Michael Schr{\"o}ter and Thorsten Strufe},
title = {Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing},
journal = {Proceedings of the IEEE},
year = {2019},
volume = {107},
number = {1},
pages = {204--231},
month = jan,
doi = {10.1109/JPROC.2018.2874895},
issn = {0018-9219},
url = {https://ieeexplore.ieee.org/document/8565890}
}Downloads
1812_Fettweis_IEEEProc [PDF]
Related Paths
Permalink
- Hasna Bouraoui, Jeronimo Castrillon, Chadlia Jerad, "Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications" , Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 4:1–4:6, New York, NY, USA, Jan 2019. [doi] [Bibtex & Downloads]
Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications
Reference
Hasna Bouraoui, Jeronimo Castrillon, Chadlia Jerad, "Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications" , Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 4:1–4:6, New York, NY, USA, Jan 2019. [doi]
Bibtex
@InProceedings{bouraoui_parma19,
author = {Hasna Bouraoui and Jeronimo Castrillon and Chadlia Jerad},
title = {Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications},
booktitle = {Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2019},
series = {PARMA-DITAM 2019},
pages = {4:1--4:6},
articleno = {4},
numpages = {6},
address = {New York, NY, USA},
month = jan,
publisher = {ACM},
isbn = {978-1-4503-6321-1},
url = {http://doi.acm.org/10.1145/3310411.3310417},
doi = {10.1145/3310411.3310417},
acmid = {3310417},
location = {Valencia, Spain},
numpages = {6}
}Downloads
1901_Bouraoui_PARMA [PDF]
Related Paths
Permalink
- Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "RTSim: A Cycle-accurate Simulator for Racetrack Memories" , In IEEE Computer Architecture Letters, IEEE, vol. 18, no. 1, pp. 43–46, Jan 2019. [doi] [Bibtex & Downloads]
RTSim: A Cycle-accurate Simulator for Racetrack Memories
Reference
Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "RTSim: A Cycle-accurate Simulator for Racetrack Memories" , In IEEE Computer Architecture Letters, IEEE, vol. 18, no. 1, pp. 43–46, Jan 2019. [doi]
Bibtex
@Article{khan_ieeecal19,
author = {Asif Ali Khan and Fazal Hameed and Robin Bl{\"a}sing and Stuart Parkin and Jeronimo Castrillon},
title = {{RTS}im: A Cycle-accurate Simulator for Racetrack Memories},
journal = {IEEE Computer Architecture Letters},
year = {2019},
volume = {18},
number = {1},
pages = {43--46},
month = jan,
doi = {10.1109/LCA.2019.2899306},
issn = {1556-6056},
publisher = {IEEE},
url = {https://ieeexplore.ieee.org/document/8642352}
}Downloads
1902_khan_IEEECAL [PDF]
Related Paths
Permalink
2018
- Adilla Susungi, Norman A. Rink, Albert Cohen, Jeronimo Castrillon, Claude Tadonki, "Meta-programming for Cross-Domain Tensor Optimizations" , Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'18), ACM, pp. 79–92, New York, NY, USA, Nov 2018. [doi] [Bibtex & Downloads]
Meta-programming for Cross-Domain Tensor Optimizations
Reference
Adilla Susungi, Norman A. Rink, Albert Cohen, Jeronimo Castrillon, Claude Tadonki, "Meta-programming for Cross-Domain Tensor Optimizations" , Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'18), ACM, pp. 79–92, New York, NY, USA, Nov 2018. [doi]
Bibtex
@InProceedings{rink_gpce18,
author = {Adilla Susungi and Norman A. Rink and Albert Cohen and Jeronimo Castrillon and Claude Tadonki},
title = {Meta-programming for Cross-Domain Tensor Optimizations},
booktitle = {Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'18)},
year = {2018},
series = {GPCE 2018},
pages = {79--92},
numpages = {14},
address = {New York, NY, USA},
month = nov,
publisher = {ACM},
keywords = {conf},
location = {Boston, MA, USA},
isbn = {978-1-4503-6045-6},
url = {http://doi.acm.org/10.1145/3278122.3278131},
doi = {10.1145/3278122.3278131},
acmid = {3278131},
}Downloads
1811_Rink_GPCE [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Parallel programming methodologies for manycores" , In NeXtream Solution Seminar & Silexica Technology Workshop (invited talk), Oct 2018. [Bibtex & Downloads]
Parallel programming methodologies for manycores
Reference
Jeronimo Castrillon, "Parallel programming methodologies for manycores" , In NeXtream Solution Seminar & Silexica Technology Workshop (invited talk), Oct 2018.
Bibtex
@Misc{castrillon_neXtream2018,
author = {Castrillon, Jeronimo},
title = {Parallel programming methodologies for manycores},
howpublished = {NeXtream Solution Seminar \& Silexica Technology Workshop (invited talk)},
month = oct,
year = {2018},
keywords = {invitedtalk},
location = {Tokyo, Japan},
url = {https://nextream.bz/nss/2018/?page_id=29}
}Downloads
181017_castrill_slx-tokyo_sent-2 [PDF]
Permalink
- Rainer Leupers, Miguel A. Aguilar, Jeronimo Castrillon, Weihua Sheng, "Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems" , Chapter in Handbook of Signal Processing Systems (3rd Edition) (Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo) , Springer New York, pp. 1021–1062, Sep 2018. [doi] [Bibtex & Downloads]
Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems
Reference
Rainer Leupers, Miguel A. Aguilar, Jeronimo Castrillon, Weihua Sheng, "Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems" , Chapter in Handbook of Signal Processing Systems (3rd Edition) (Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo) , Springer New York, pp. 1021–1062, Sep 2018. [doi]
Abstract
The increasing demands of modern embedded systems, such as high-performance and energy-efficiency, have motivated the use of heterogeneous multi-core platforms enabled by Multiprocessor System-on-Chips (MPSoCs). To fully exploit the power of these platforms, new tools are needed to address the increasing software complexity to achieve a high productivity. An MPSoC compiler is a tool-chain to tackle the problems of application modeling, platform description, software parallelization, software distribution and code generation for an efficient usage of the target platform. This chapter discusses various aspects of compilers for heterogeneous embedded multi-core systems, using the well-established single-core C compiler technology as a baseline for comparison. After a brief introduction to the MPSoC compiler technology, the important ingredients of the compilation process are explained in detail. Finally, a number of case studies from academia and industry are presented to illustrate the concepts discussed in this chapter.
Bibtex
@InCollection{leupers18_spschapter,
author = {Leupers, Rainer and Aguilar, Miguel A. and Castrillon, Jeronimo and Sheng, Weihua},
title = {Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems},
booktitle = {Handbook of Signal Processing Systems (3rd Edition)},
publisher = {Springer New York},
year = {2018},
month = sep,
editor = {Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo},
pages = {1021--1062},
abstract = {The increasing demands of modern embedded systems, such as high-performance and energy-efficiency, have motivated the use of heterogeneous multi-core platforms enabled by Multiprocessor System-on-Chips (MPSoCs). To fully exploit the power of these platforms, new tools are needed to address the increasing software complexity to achieve a high productivity. An MPSoC compiler is a tool-chain to tackle the problems of application modeling, platform description, software parallelization, software distribution and code generation for an efficient usage of the target platform. This chapter discusses various aspects of compilers for heterogeneous embedded multi-core systems, using the well-established single-core C compiler technology as a baseline for comparison. After a brief introduction to the MPSoC compiler technology, the important ingredients of the compilation process are explained in detail. Finally, a number of case studies from academia and industry are presented to illustrate the concepts discussed in this chapter.},
doi = {10.1007/978-3-319-91734-4_28},
isbn = {978-3-319-91733-7},
url = {https://link.springer.com/chapter/10.1007/978-3-319-91734-4_28},
}Downloads
1809_Leupers_SPSBookChapter [PDF]
Permalink
- Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores" , Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi] [Bibtex & Downloads]
On the Representation of Mappings to Multicores
Reference
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores" , Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi]
Abstract
Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.
Bibtex
@InProceedings{goen_mcsoc18,
author = {Andr\'{e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On the Representation of Mappings to Multicores},
booktitle = {Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18)},
year = {2018},
address = {Vietnam National University, Hanoi, Vietnam},
month = sep,
pages = {184--191},
doi = {10.1109/MCSoC2018.2018.00039},
url = {https://ieeexplore.ieee.org/document/8540232},
isbn = {978-1-5386-6689-0/18/},
abstract = {Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.},
}Downloads
1809_Goens_MCSoC [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems" , In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi] [Bibtex & Downloads]
A Hardware/Software Stack for Heterogeneous Systems
Reference
Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems" , In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi]
Abstract
Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.
Bibtex
@Article{castrillon_tmscs17,
author = {Jeronimo Castrillon and Matthias Lieber and Sascha Kl{\"u}ppelholz and Marcus V{\"o}lp and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andr\'{e}s Goens and Sebastian Haas and Dirk Habich and Hermann H{\"a}rtig and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Akash Kumar and Wolfgang Lehner and Linda Leuschner and Siqi Ling and Steffen M{\"a}rcker and Christian Menard and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt and Sascha Wunderlich},
title = {A Hardware/Software Stack for Heterogeneous Systems},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
year = {2018},
month = jul,
volume={4},
number={3},
pages={243-259},
abstract = {Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.},
doi = {10.1109/TMSCS.2017.2771750},
issn = {2332-7766},
url = {http://ieeexplore.ieee.org/document/8103042/}
}Downloads
1711_Castrillon_TMSCS [PDF]
Related Paths
Permalink
- Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Performance and Energy Efficient Design of STT-RAM Last-Level-Cache" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 26, no. 6, pp. 1059–1072, Jun 2018. [doi] [Bibtex & Downloads]
Performance and Energy Efficient Design of STT-RAM Last-Level-Cache
Reference
Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Performance and Energy Efficient Design of STT-RAM Last-Level-Cache" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 26, no. 6, pp. 1059–1072, Jun 2018. [doi]
Abstract
Recent research has proposed having a die-stacked last-level cache (LLC) to overcome the memory wall. Lately, spin-transfer-torque random access memory (STT-RAM) caches have received attention, since they provide improved energy efficiency compared with DRAM caches. However, recently proposed STT-RAM cache architectures unnecessarily dissipate energy by fetching unneeded cache lines (CLs) into the row buffer (RB). In this paper, we propose a selective read policy for the STT-RAM which fetches those CLs into the RB that are likely to be reused. In addition, we propose a tags-update policy that reduces the number of STT-RAM writebacks. This reduces the number of reads/writes and thereby decreases the energy consumption. To reduce the latency penalty of our selective read policy, we propose the following performance optimizations: 1) an RB tags-bypass policy that reduces STT-RAM access latency; 2) an LLC data cache that stores the CLs that are likely to be used in the near future; 3) an address organization scheme that simultaneously reduces LLC access latency and miss rate; and 4) a tags-to-column mapping policy that improves access parallelism. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an eight-core system. We compare our approach with a recently proposed STT-RAM LLC with subarray parallelism support and show that our synergistic policies reduce the average LLC dynamic energy consumption by 75% and improve the system performance by 6.5%. Compared with the state-of-the-art DRAM LLC with subarray parallelism, our architecture reduces the LLC dynamic energy consumption by 82% and improves system performance by 6.8%.
Bibtex
@Article{hameed_tvlsi18,
author = {Fazal Hameed and Asif Ali Khan and Jeronimo Castrillon},
title = {Performance and Energy Efficient Design of STT-RAM Last-Level-Cache},
journal = {IEEE Transactions on Very Large Scale Integration Systems (TVLSI)},
year = {2018},
volume = {26},
number = {6},
pages = {1059--1072},
month = jun,
abstract = {Recent research has proposed having a die-stacked last-level cache (LLC) to overcome the memory wall. Lately, spin-transfer-torque random access memory (STT-RAM) caches have received attention, since they provide improved energy efficiency compared with DRAM caches. However, recently proposed STT-RAM cache architectures unnecessarily dissipate energy by fetching unneeded cache lines (CLs) into the row buffer (RB). In this paper, we propose a selective read policy for the STT-RAM which fetches those CLs into the RB that are likely to be reused. In addition, we propose a tags-update policy that reduces the number of STT-RAM writebacks. This reduces the number of reads/writes and thereby decreases the energy consumption. To reduce the latency penalty of our selective read policy, we propose the following performance optimizations: 1) an RB tags-bypass policy that reduces STT-RAM access latency; 2) an LLC data cache that stores the CLs that are likely to be used in the near future; 3) an address organization scheme that simultaneously reduces LLC access latency and miss rate; and 4) a tags-to-column mapping policy that improves access parallelism. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an eight-core system. We compare our approach with a recently proposed STT-RAM LLC with subarray parallelism support and show that our synergistic policies reduce the average LLC dynamic energy consumption by 75\% and improve the system performance by 6.5\%. Compared with the state-of-the-art DRAM LLC with subarray parallelism, our architecture reduces the LLC dynamic energy consumption by 82\% and improves system performance by 6.8\%.},
doi = {10.1109/TVLSI.2018.2804938},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1803_Hameed_TVLSI.pdf:PDF},
issn = {1063-8210},
numpages = {14},
url = {http://ieeexplore.ieee.org/document/8307465/}
}Downloads
1803_Hameed_TVLSI [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Heterogeneous Post-CMOS Technologies Meet Software" , In Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk), Jun 2018. [Bibtex & Downloads]
Heterogeneous Post-CMOS Technologies Meet Software
Reference
Jeronimo Castrillon, "Heterogeneous Post-CMOS Technologies Meet Software" , In Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk), Jun 2018.
Bibtex
@Misc{castrillon2018ISC,
author = {Castrillon, Jeronimo},
title = {Heterogeneous Post-CMOS Technologies Meet Software},
howpublished = {Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk)},
month = jun,
year = {2018},
keywords = {invitedtalk},
location = {Frankfurt, Germany},
url = {https://beyondcmos.ornl.gov/2018/agenda.html}
}Downloads
180628_castrillon_isc-postmoore_send [PDF]
Permalink
- Jeronimo Castrillon, "Parallel programming: Current and future systems" , In 50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM'18) (invited talk), May 2018. [Bibtex & Downloads]
Parallel programming: Current and future systems
Reference
Jeronimo Castrillon, "Parallel programming: Current and future systems" , In 50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM'18) (invited talk), May 2018.
Bibtex
@Misc{castrillon2018UdeA,
author = {Castrillon, Jeronimo},
title = {Parallel programming: Current and future systems},
howpublished = {50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM'18) (invited talk)},
month = may,
year = {2018},
location = {Universidad de Antioquia, Medell{\'i}n, Colombia}
}Downloads
180516_castrillon_50years_EE_UdeA [PDF]
Permalink
- Sven Karol, Tobias Nett, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Domain-Specific Language and Editor for Parallel Particle Methods" , In ACM Transactions on Mathematical Software (TOMS), ACM, vol. 44, no. 3, pp. 32, New York, NY, USA, Mar 2018. [doi] [Bibtex & Downloads]
A Domain-Specific Language and Editor for Parallel Particle Methods
Reference
Sven Karol, Tobias Nett, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Domain-Specific Language and Editor for Parallel Particle Methods" , In ACM Transactions on Mathematical Software (TOMS), ACM, vol. 44, no. 3, pp. 32, New York, NY, USA, Mar 2018. [doi]
Abstract
Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing DSLs is not easy, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL comfortable for programmers. Some of these features –e.g., syntax highlighting, type inference, error reporting– are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the Meta Programming System (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language, a Fortran-based DSL that uses conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer’s experience is improved using static analyses and projectional editing, i.e., code-structure editing, constrained by syntax, as opposed to free-text editing. We present an explicit domain model for particle abstractions and the first formal type system for partircle methods.
Bibtex
@Article{karol_toms18,
author = {Karol, Sven and Nett, Tobias and Castrillon, Jeronimo and Sbalzarini, Ivo F.},
title = {A Domain-Specific Language and Editor for Parallel Particle Methods},
journal = {ACM Transactions on Mathematical Software (TOMS)},
issue_date = {March 2018},
volume = {44},
number = {3},
month = mar,
year = {2018},
issn = {0098-3500},
pages = {34:1--34:32},
articleno = {34},
numpages = {32},
url = {http://doi.acm.org/10.1145/3175659},
doi = {10.1145/3175659},
acmid = {3175659},
publisher = {ACM},
address = {New York, NY, USA},
pages = {32},
abstract = {
Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing DSLs is not easy, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL comfortable for programmers. Some of these features --e.g., syntax highlighting, type inference, error reporting-- are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the Meta Programming System (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language, a Fortran-based DSL that uses conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer’s experience is improved using static analyses and projectional editing, i.e., code-structure editing, constrained by syntax, as opposed to free-text editing. We present an explicit domain model for particle abstractions and the first formal type system for partircle methods.},
}Downloads
1709_Karol_TOMS-arxiv [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Fazal Hameed, Jeronimo Castrillon, "STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement" , Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018), Mar 2018. [Bibtex & Downloads]
STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement
Reference
Fazal Hameed, Jeronimo Castrillon, "STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement" , Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018), Mar 2018.
Bibtex
@InProceedings{hameed_nvmw18,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement},
booktitle = {Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018)},
year = {2018},
month = mar,
location = {San Diego, CA, USA},
numpages = {2}
}Downloads
1803_Hameed_NVMW [PDF]
Related Paths
Permalink
- Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O" , Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
Compiling for Concise Code and Efficient I/O
Reference
Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O" , Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi]
Abstract
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.
Bibtex
@InProceedings{ertel_cc18,
author = {Sebastian Ertel and Andr\'{e}s Goens and Justus Adam and Jeronimo Castrillon},
title = {Compiling for Concise Code and Efficient I/O},
booktitle = {Proceedings of the 27th International Conference on Compiler Construction (CC 2018)},
series = {CC 2018},
year = {2018},
month = feb,
location = {Vienna, Austria},
publisher = {ACM},
numpages = {12},
pages = {104--115},
doi = {10.1145/3178372.3179505},
url = {https://dl.acm.org/citation.cfm?id=3179505},
acmid = {3179505},
address = {New York, NY, USA},
abstract = {Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.},
}Downloads
cc-2018-slides [PDF]
1802_Ertel_CC [PDF]
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Supporting Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM), ACM, pp. 41–50, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
Supporting Fine-grained Dataflow Parallelism in Big Data Systems
Reference
Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Supporting Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM), ACM, pp. 41–50, New York, NY, USA, Feb 2018. [doi]
Abstract
Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.
Bibtex
@InProceedings{ertel_pmam18,
author = {Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {Supporting Fine-grained Dataflow Parallelism in Big Data Systems},
booktitle = {Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM)},
year = {2018},
series = {PMAM'18},
address = {New York, NY, USA},
month = feb,
publisher = {ACM},
doi = {10.1145/3178442.3178447},
isbn = {978-1-4503-5645-9},
location = {Vienna, Austria},
pages = {41--50},
numpages = {10},
acmid = {3178447},
url = {http://doi.acm.org/10.1145/3178442.3178447},
abstract = {Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.},
}Downloads
pmam-2018-slides [PDF]
1802_Ertel_PMAM [PDF]
Related Paths
Permalink
- Norman A. Rink, Immo Huismann, Adilla Susungi, Jeronimo Castrillon, Jörg Stiller, Jochen Fröhlich, Claude Tadonki, "CFDlang: High-level Code Generation for High-order Methods in Fluid Dynamics" , Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018), ACM, pp. 5:1–5:10, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
CFDlang: High-level Code Generation for High-order Methods in Fluid Dynamics
Reference
Norman A. Rink, Immo Huismann, Adilla Susungi, Jeronimo Castrillon, Jörg Stiller, Jochen Fröhlich, Claude Tadonki, "CFDlang: High-level Code Generation for High-order Methods in Fluid Dynamics" , Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018), ACM, pp. 5:1–5:10, New York, NY, USA, Feb 2018. [doi]
Abstract
Numerical simulations continue to enable fast and enormous progress in science and engineering. Writing efficient numerical codes is a difficult challenge that encompasses a variety of tasks from designing the right algorithms to exploiting the full potential of a platform's architecture. Domain-specific languages (DSLs) can ease these tasks by offering the right abstractions for expressing numerical problems. With the aid of domain knowledge, efficient code can then be generated automatically from abstract expressions. In this work, we present the CFDlang DSL for expressing tensor operations that constitute the performance-critical code sections in a class of real numerical applications from fluid dynamics. We demonstrate that CFDlang can be used to generate code automatically that performs as well, if not better, than carefully hand-optimized code.
Bibtex
@InProceedings{rink_rwdsl18,
author = {Norman A. Rink and Immo Huismann and Adilla Susungi and Jeronimo Castrillon and J{\"o}rg Stiller and Jochen Fr{\"o}hlich and Claude Tadonki},
title = {CFDlang: High-level Code Generation for High-order Methods in Fluid Dynamics},
booktitle = {Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018)},
year = {2018},
series = {RWDSL2018},
pages = {5:1--5:10},
address = {New York, NY, USA},
month = feb,
publisher = {ACM},
abstract = {Numerical simulations continue to enable fast and enormous progress in science and engineering. Writing efficient numerical codes is a difficult challenge that encompasses a variety of tasks from designing the right algorithms to exploiting the full potential of a platform's architecture. Domain-specific languages (DSLs) can ease these tasks by offering the right abstractions for expressing numerical problems. With the aid of domain knowledge, efficient code can then be generated automatically from abstract expressions. In this work, we present the CFDlang DSL for expressing tensor operations that constitute the performance-critical code sections in a class of real numerical applications from fluid dynamics. We demonstrate that CFDlang can be used to generate code automatically that performs as well, if not better, than carefully hand-optimized code.},
acmid = {3183900},
articleno = {5},
doi = {10.1145/3183895.3183900},
isbn = {978-1-4503-6355-6},
location = {Vienna, Austria},
numpages = {10},
url = {http://doi.acm.org/10.1145/3183895.3183900}
}Downloads
1802_Rink_RWDSL [PDF]
Related Paths
Permalink
- Hermann Härtig, Nils Asmussen, Jeronimo Castrillon, Adam Lackorzynski, Michael Roitzsch, Carsten Weinhold, Akash Kumar, "Extremely Heterogeneous Systems – Not Just For Niches" , In Proceeding: Extreme Heterogeneity Workshop, Feb 2018. [Bibtex & Downloads]
Extremely Heterogeneous Systems – Not Just For Niches
Reference
Hermann Härtig, Nils Asmussen, Jeronimo Castrillon, Adam Lackorzynski, Michael Roitzsch, Carsten Weinhold, Akash Kumar, "Extremely Heterogeneous Systems – Not Just For Niches" , In Proceeding: Extreme Heterogeneity Workshop, Feb 2018.
Bibtex
@InProceedings{haertig_ehw18,
author = {Hermann H{\"a}rtig and Nils Asmussen and Jeronimo Castrillon and Adam Lackorzynski and Michael Roitzsch and Carsten Weinhold and Akash Kumar},
title = {Extremely Heterogeneous Systems -- Not Just For Niches},
booktitle = {Extreme Heterogeneity Workshop},
year = {2018},
month = feb,
note = {(Workshop took place over remote conferencing)},
location = {Gaithersburg, MD, USA}
}Downloads
1802_Haertig_EHW [PDF]
Related Paths
Permalink
- Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap" , Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi] [Bibtex & Downloads]
Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap
Reference
Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap" , Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi]
Abstract
Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation.
Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.Bibtex
@InProceedings{khasanov_parma18,
author = {Robert Khasanov and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap},
booktitle = {Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
series = {PARMA-DITAM '18},
isbn = {978-1-4503-6444-7},
pages = {20--25},
year = {2018},
month = jan,
numpages = {6},
url = {http://doi.acm.org/10.1145/3183767.3183790},
doi = {10.1145/3183767.3183790},
acmid = {3183790},
publisher = {ACM},
address = {New York, NY, USA},
location = {Manchester, United Kingdom},
abstract = {Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation.
Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.},
}Downloads
1801_Khasanov_PARMA-DITAM [PDF]
Related Paths
Permalink
- Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers" , Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018. [Bibtex & Downloads]
Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers
Reference
Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers" , Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018.
Bibtex
@InProceedings{goens_multiprog18,
author = {Andr{\'e}s Goens and Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers},
booktitle = {Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2018},
url = {http://research.ac.upc.edu/multiprog/multiprog2018/papers/MULTIPROG-2018_Goens.pdf},
month = jan,
location = {Manchester, United Kingdom}
}Downloads
1801_Goens_MULTIRPOG [PDF]
Related Paths
Permalink
- Asif Ali Khan, Fazal Hameed, Jeronimo Castrillon, "NVMain Extension for Multi-Level Cache Systems" , Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 7:1–7:6, New York, NY, USA, Jan 2018. [doi] [Bibtex & Downloads]
NVMain Extension for Multi-Level Cache Systems
Reference
Asif Ali Khan, Fazal Hameed, Jeronimo Castrillon, "NVMain Extension for Multi-Level Cache Systems" , Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 7:1–7:6, New York, NY, USA, Jan 2018. [doi]
Bibtex
@InProceedings{khan_rapido18,
author = {Asif Ali Khan and Fazal Hameed and Jeronimo Castrillon},
title = {NVMain Extension for Multi-Level Cache Systems},
booktitle = {Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
series = {RAPIDO '18},
year = {2018},
month = jan,
pages = {7:1--7:6},
articleno = {7},
numpages = {6},
url = {http://doi.acm.org/10.1145/3180665.3180672},
doi = {10.1145/3180665.3180672},
acmid = {3180672},
publisher = {ACM},
address = {New York, NY, USA},
location = {Manchester, United Kingdom},
isbn = {978-1-4503-6417-1},
}Downloads
1801_Khan_RAPIDO [PDF]
Related Paths
Permalink
2017
- Fazal Hameed, Christian Menard, Jeronimo Castrillon, "Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache" , Proceedings of the International Symposium on Memory Systems (MemSys'17), ACM, pp. 141–151, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache
Reference
Fazal Hameed, Christian Menard, Jeronimo Castrillon, "Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache" , Proceedings of the International Symposium on Memory Systems (MemSys'17), ACM, pp. 141–151, New York, NY, USA, Oct 2017. [doi]
Bibtex
@InProceedings{hameed_memsys17,
author = {Fazal Hameed and Christian Menard and Jeronimo Castrillon},
title = {Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache},
booktitle = {Proceedings of the International Symposium on Memory Systems (MemSys'17)},
series = {MEMSYS '17},
year = {2017},
month = oct,
isbn = {978-1-4503-5335-9},
location = {Alexandria, Virginia},
pages = {141--151},
numpages = {11},
url = {http://doi.acm.org/10.1145/3132402.3132414},
doi = {10.1145/3132402.3132414},
acmid = {3132414},
publisher = {ACM},
address = {New York, NY, USA},
}Downloads
1710_Hameed_Memsys [PDF]
Related Paths
Permalink
- Miguel Angel Aguilar, Abhishek Aggarwal, Awaid Shaheen, Rainer Leupers, Gerd Ascheid, Jeronimo Castrillon, Liam Fitzpatrick, "Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress" , Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), ACM, pp. 14:1–14:2, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress
Reference
Miguel Angel Aguilar, Abhishek Aggarwal, Awaid Shaheen, Rainer Leupers, Gerd Ascheid, Jeronimo Castrillon, Liam Fitzpatrick, "Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress" , Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), ACM, pp. 14:1–14:2, New York, NY, USA, Oct 2017. [doi]
Bibtex
@inproceedings{aguilar_cases17,
author = {Aguilar, Miguel Angel and Aggarwal, Abhishek and Shaheen, Awaid and Leupers, Rainer and Ascheid, Gerd and Castrillon, Jeronimo and Fitzpatrick, Liam},
title = {Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress},
booktitle = {Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)},
series = {CASES '17},
year = {2017},
month = oct,
isbn = {978-1-4503-5184-3},
location = {Seoul, Republic of Korea},
pages = {14:1--14:2},
articleno = {14},
numpages = {2},
url = {http://doi.acm.org/10.1145/3125501.3125521},
doi = {10.1145/3125501.3125521},
acmid = {3125521},
publisher = {ACM},
address = {New York, NY, USA},
}Downloads
1710_Aguilar_CASES [PDF]
Permalink
- Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, Jochen Fröhlich, "Towards Compositional and Generative Tensor Optimizations" , Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'17), ACM, pp. 169–175, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Towards Compositional and Generative Tensor Optimizations
Reference
Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, Jochen Fröhlich, "Towards Compositional and Generative Tensor Optimizations" , Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'17), ACM, pp. 169–175, New York, NY, USA, Oct 2017. [doi]
Bibtex
@InProceedings{rink_gpce17,
author = {Adilla Susungi and Norman A. Rink and Jeronimo Castrillon and Immo Huismann and Albert Cohen and Claude Tadonki and J{\"o}rg Stiller and Jochen Fr{\"o}hlich},
title = {Towards Compositional and Generative Tensor Optimizations},
booktitle = {Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'17)},
series = {GPCE 2017},
year = {2017},
pages = {169--175},
month = oct,
isbn = {978-1-4503-5524-7},
location = {Vancouver, BC, Canada},
pages = {169--175},
numpages = {7},
url = {http://doi.acm.org/10.1145/3136040.3136050},
doi = {10.1145/3136040.3136050},
acmid = {3136050},
publisher = {ACM},
address = {New York, NY, USA},
}Downloads
1710_Rink_GPCE [PDF]
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017) (Lawrence Rauchwerger) , Springer, Cham, pp. 281–282, Oct 2017. [doi] [Bibtex & Downloads]
POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems
Reference
Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017) (Lawrence Rauchwerger) , Springer, Cham, pp. 281–282, Oct 2017. [doi]
Bibtex
@InProceedings{ertel_lcpc17,
author = {Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems},
booktitle = {Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017)},
year = {2017},
editor = {Lawrence Rauchwerger},
publisher = {Springer, Cham},
location = {Texas A{\&}M University, College Station, Texas},
month = oct,
isbn = {978-3-030-35224-0},
pages = {281--282},
doi = {10.1007/978-3-030-35225-7},
url = {https://link.springer.com/book/10.1007%2F978-3-030-35225-7},
}Downloads
1710_Ertel_LCPC [PDF]
Related Paths
Permalink
- Sven Karol, Tobias Nett, Pietro Incardona, Nesrine Khouzami, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Language and Development Environment for Parallel Particle Methods" , Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017 (P. Wriggers and M. Bischoff and E. Oñate and D.R.J. Owen and T. Zohdi) , Sep 2017. [Bibtex & Downloads]
A Language and Development Environment for Parallel Particle Methods
Reference
Sven Karol, Tobias Nett, Pietro Incardona, Nesrine Khouzami, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Language and Development Environment for Parallel Particle Methods" , Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017 (P. Wriggers and M. Bischoff and E. Oñate and D.R.J. Owen and T. Zohdi) , Sep 2017.
Bibtex
@InProceedings{karol_particles17,
author = {Sven Karol and Tobias Nett and Pietro Incardona and Nesrine Khouzami and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {A Language and Development Environment for Parallel Particle Methods},
booktitle = {Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017},
year = {2017},
editor = {P. Wriggers and M. Bischoff and E. O{\~n}ate and D.R.J. Owen and T. Zohdi},
url = {https://www.semanticscholar.org/paper/A-Language-and-Development-Environment-for-Paralle-Karol-Nett/2b79bd3836aeb8e2fb2a2b5d9949f9efb1bdfab7?tab=abstract},
month = sep,
}Downloads
1709_Karol_particles [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Jeronimo Castrillon, Tei-Wei Kuo, Heike E. Riel, Matthias Lieber, "Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)" , In Dagstuhl Reports (Jerónimo Castrillón-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber) , Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, vol. 7, no. 2, pp. 1–22, Dagstuhl, Germany, Aug 2017. [doi] [Bibtex & Downloads]
Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)
Reference
Jeronimo Castrillon, Tei-Wei Kuo, Heike E. Riel, Matthias Lieber, "Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)" , In Dagstuhl Reports (Jerónimo Castrillón-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber) , Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, vol. 7, no. 2, pp. 1–22, Dagstuhl, Germany, Aug 2017. [doi]
Bibtex
@Article{castrillnmazo_et_al:DR:2017:7349,
author = {Jeronimo Castrillon and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber},
title = ,
journal = {Dagstuhl Reports},
year = {2017},
volume = {7},
number = {2},
month = aug,
pages = {1--22},
address = {Dagstuhl, Germany},
annote = {Keywords: 3D integration, compilers, emerging post-CMOS circuit materials and technologies, hardware/software co-design, heterogeneous hardware, nanoelectronics},
doi = {10.4230/DagRep.7.2.1},
editor = {Jer{\'o}nimo Castrill{\'o}n-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber},
issn = {2192-5283},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
url = {http://drops.dagstuhl.de/opus/volltexte/2017/7349},
urn = {urn:nbn:de:0030-drops-73499}
}Downloads
No Downloads available for this publication
Related Paths
Permalink
- Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis" , In ACM Transactions on Architecture and Code Optimization (TACO),, ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi] [Bibtex & Downloads]
Symmetry in Software Synthesis
Reference
Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis" , In ACM Transactions on Architecture and Code Optimization (TACO),, ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi]
Abstract
With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.
Bibtex
@article{goens_taco17symmetry,
author = {Goens, Andr{\'e}s and Siccha, Sergio and Castrillon, Jeronimo},
title = {Symmetry in Software Synthesis},
journal = {ACM Transactions on Architecture and Code Optimization (TACO),},
issue_date = {July 2017},
volume = {14},
number = {2},
month = jul,
year = {2017},
issn = {1544-3566},
pages = {20:1--20:26},
articleno = {20},
numpages = {26},
url = {http://doi.acm.org/10.1145/3095747},
doi = {10.1145/3095747},
acmid = {3095747},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Scalability, automation, clusters, design-space exploration, group theory, heterogeneous, inverse-semigroups, mapping, metaheuristics, network-on-chip, symmetry},
eprint = "arXiv:1704.06623",
abstract = {With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.}
}Downloads
1704_Goens_TACO-arxiv [PDF]
Related Paths
Permalink
- Christian Menard, Matthias Jung, Jeronimo Castrillon, Norbert Wehn, "System Simulation with gem5 and SystemC: The Keystone for Full Interoperability" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), pp. 62–69, Jul 2017. [doi] [Bibtex & Downloads]
System Simulation with gem5 and SystemC: The Keystone for Full Interoperability
Reference
Christian Menard, Matthias Jung, Jeronimo Castrillon, Norbert Wehn, "System Simulation with gem5 and SystemC: The Keystone for Full Interoperability" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), pp. 62–69, Jul 2017. [doi]
Abstract
SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of accurate, free, changeable and realistic SystemC models of modern CPUs. Therefore, many researchers use the cycle accurate open source system simulator gem5, which has been developed in parallel to the SystemC standard. In this paper we present a coupling of gem5 with SystemC that offers full interoperability between both simulation frameworks, and therefore enables a huge set of possibilities for system level design space exploration. Furthermore, we show that the coupling itself only induces a relatively small overhead to the total execution time of the simulation.
Bibtex
@InProceedings{menard_samos17,
author = {Christian Menard and Matthias Jung and Jeronimo Castrillon and Norbert Wehn},
title = {System Simulation with gem5 and SystemC: The Keystone for Full Interoperability},
booktitle = {Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS)},
year = {2017},
month = jul,
location = {Pythagorion, Greece},
pages = {62--69},
organization = {IEEE},
doi = {10.1109/SAMOS.2017.8344612},
url = {https://ieeexplore.ieee.org/document/8344612/},
isbn = {978-1-5386-3437-0},
abstract = {SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of accurate, free, changeable and realistic SystemC models of modern CPUs. Therefore, many researchers use the cycle accurate open source system simulator gem5, which has been developed in parallel to the SystemC standard. In this paper we present a coupling of gem5 with SystemC that offers full interoperability between both simulation frameworks, and therefore enables a huge set of possibilities for system level design space exploration. Furthermore, we show that the coupling itself only induces a relatively small overhead to the total execution time of the simulation.},
}Downloads
1707_Menard_SAMOS [PDF]
Related Paths
Permalink
- Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]
TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings
Reference
Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi]
Bibtex
@InProceedings{goens_scopes17,
author = {Andr\'{e}s Goens and Robert Khasanov and Marcus H{\"a}hnel and Till Smejkal and Hermann H{\"a}rtig and Jeronimo Castrillon},
title = {TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17)},
year = {2017},
month = jun,
series = {SCOPES '17},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
pages = {11--20},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078663},
doi = {10.1145/3078659.3078663},
acmid = {3078663},
publisher = {ACM},
address = {New York, NY, USA}
}Downloads
1706_Goens_SCOPES [PDF]
Related Paths
Permalink
- Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]
Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering
Reference
Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi]
Bibtex
@InProceedings{hempel_scopes17,
author = {Gerald Hempel and Andr\'{e}s Goens and Josefine Asmus and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17)},
year = {2017},
series = {SCOPES '17},
pages = {21--30},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
acmid = {3078667},
doi = {10.1145/3078659.3078667},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078667}
}Downloads
1706_Hempel_SCOPES [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Johanna Sepúlveda, Vania Marangozova-Martin, Jeronimo Castrillon, "Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface" , Elsevier, Jun 2017. [doi] [Bibtex & Downloads]
Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface
Reference
Johanna Sepúlveda, Vania Marangozova-Martin, Jeronimo Castrillon, "Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface" , Elsevier, Jun 2017. [doi]
Bibtex
@Article{sepulveda_alchemy17_preface,
author = {Sep{\'u}lveda, Johanna and Marangozova-Martin, Vania and Castrillon, Jeronimo},
title = {Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface},
year = {2017},
month = jun,
doi = {10.1016/j.procs.2017.05.276},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1706_sepulveda_alchemy.pdf:PDF},
url = {http://www.sciencedirect.com/science/article/pii/S1877050917309286},
publisher = {Elsevier}
}Downloads
No Downloads available for this publication
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Extending a Compiler Backend for Complete Memory Error Detection" , In Proceeding: Lecture Notes in Informatics: Automotive - Safety & Security 2017 (Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Plödereder) , pp. 61–74, May 2017. (Best paper award) [Bibtex & Downloads]
Extending a Compiler Backend for Complete Memory Error Detection
Reference
Norman A. Rink, Jeronimo Castrillon, "Extending a Compiler Backend for Complete Memory Error Detection" , In Proceeding: Lecture Notes in Informatics: Automotive - Safety & Security 2017 (Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Plödereder) , pp. 61–74, May 2017. (Best paper award)
Abstract
Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to faults. Applications can be protected against errors resulting from faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to intermediate program representations are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, the compiler backend may introduce additional memory accesses. This report presents an extended compiler backend that protects these accesses against faults in the memory system. It is demonstrated that this enables the detection of all single bit flips in memory. On a subset of SPEC CINT2006 the runtime overhead caused by the extended backend amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86 64.
Bibtex
@InProceedings{rink_automotive17,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Extending a Compiler Backend for Complete Memory Error Detection},
booktitle = {Lecture Notes in Informatics: Automotive - Safety \& Security 2017},
editor = {Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Pl{\"o}dereder},
year = {2017},
pages = {61--74},
month = may,
abstract = {Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to faults. Applications can be protected against errors resulting from faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to intermediate program representations are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, the compiler backend may introduce additional memory accesses. This report presents an extended compiler backend that protects these accesses against faults in the memory system. It is demonstrated that this enables the detection of all single bit flips in memory. On a subset of SPEC CINT2006 the runtime overhead caused by the extended backend amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86 64.},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1705_rink_automotive.pdf:PDF},
isbn = {978-3-88579-663-3},
issn = {1617-5468},
url = {https://dl.gi.de/bitstream/handle/20.500.12116/147/paper04.pdf?sequence=1&isAllowed=y},
}Downloads
1705_rink_automotive [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Markus Haehnel, Frehiwot Melak Arega, Waltenegus Dargie, Robert Khasanov, Jeronimo Castrillon, "Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures" , Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf'17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 432-437, May 2017. [doi] [Bibtex & Downloads]
Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures
Reference
Markus Haehnel, Frehiwot Melak Arega, Waltenegus Dargie, Robert Khasanov, Jeronimo Castrillon, "Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures" , Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf'17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 432-437, May 2017. [doi]
Bibtex
@InProceedings{khasanov_dcperf17,
author = {Markus Haehnel and Frehiwot Melak Arega and Waltenegus Dargie and Robert Khasanov and Jeronimo Castrillon},
title = {Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures},
booktitle = {Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf'17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)},
year = {2017},
month = may,
volume={},
number={},
pages={432-437},
doi={10.1109/INFCOMW.2017.8116415},
ISSN={},
url = {http://ieeexplore.ieee.org/document/8116415/},
location = {Atlanta, USA}
}Downloads
1705_Khasanov_DCPerf [PDF]
Related Paths
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Trading Fault Tolerance for Performance in AN Encoding" , Proceedings of the ACM International Conference on Computing Frontiers (CF'17), ACM, pp. 183–190, New York, NY, USA, May 2017. [doi] [Bibtex & Downloads]
Trading Fault Tolerance for Performance in AN Encoding
Reference
Norman A. Rink, Jeronimo Castrillon, "Trading Fault Tolerance for Performance in AN Encoding" , Proceedings of the ACM International Conference on Computing Frontiers (CF'17), ACM, pp. 183–190, New York, NY, USA, May 2017. [doi]
Bibtex
@InProceedings{rink_cf17,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Trading Fault Tolerance for Performance in {AN} Encoding},
booktitle = {Proceedings of the ACM International Conference on Computing Frontiers (CF'17)},
year = {2017},
isbn = {978-1-4503-4487-6},
location = {Siena, Italy},
pages = {183--190},
numpages = {8},
url = {http://doi.acm.org/10.1145/3075564.3075565},
doi = {10.1145/3075564.3075565},
acmid = {3075565},
publisher = {ACM},
address = {New York, NY, USA},
month = may,
}Downloads
1705_Rink_cf [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Rainer Leupers, Miguel Angel Aguilar, Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng, "MAPS: A Software Development Environment for Embedded Multicore Applications" , Springer Netherlands, pp. 1–33, Dordrecht, Apr 2017. [doi] [Bibtex & Downloads]
MAPS: A Software Development Environment for Embedded Multicore Applications
Reference
Rainer Leupers, Miguel Angel Aguilar, Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng, "MAPS: A Software Development Environment for Embedded Multicore Applications" , Springer Netherlands, pp. 1–33, Dordrecht, Apr 2017. [doi]
Abstract
The use of heterogeneous Multi-Processor System-on-Chip (MPSoC) is a widely accepted solution to address the increasing demands on high performance and energy efficiency for modern embedded devices. To enable the full potential of these platforms, new tools are needed to tackle the programming complexity of MPSoCs, while allowing for high productivity. This chapter discusses the MPSoC Application Programming Studio (MAPS), a framework that provides facilities for expressing parallelism and tool flows for parallelization, mapping/scheduling, and code generation for heterogeneous MPSoCs. Two case studies of the use of MAPS in commercial environments are presented. This chapter closes by discussing early experiences of transferring the MAPS technology into Silexica GmbH, a start-up company that provides multi-core programming tools.
Bibtex
@InBook{leupers_hhcd17,
title = {MAPS: A Software Development Environment for Embedded Multicore Applications},
author = {Rainer Leupers and Miguel Angel Aguilar and Juan Fernando Eusse and Jeronimo Castrillon and Weihua Sheng},
editor = {Soonhoi Ha and J{\"u}rgen Teich},
publisher = {Springer Netherlands},
year = {2017},
address = {Dordrecht},
month = apr,
booktitle = {Handbook of Hardware/Software Codesign},
doi = {10.1007/978-94-017-7358-4_2-1},
isbn = {978-94-017-7358-4},
url = {http://dx.doi.org/10.1007/978-94-017-7358-4_2-1},
pages = {1--33},
abstract = {The use of heterogeneous Multi-Processor System-on-Chip (MPSoC) is a widely accepted solution to address the increasing demands on high performance and energy efficiency for modern embedded devices. To enable the full potential of these platforms, new tools are needed to tackle the programming complexity of MPSoCs, while allowing for high productivity. This chapter discusses the MPSoC Application Programming Studio (MAPS), a framework that provides facilities for expressing parallelism and tool flows for parallelization, mapping/scheduling, and code generation for heterogeneous MPSoCs. Two case studies of the use of MAPS in commercial environments are presented. This chapter closes by discussing early experiences of transferring the MAPS technology into Silexica GmbH, a start-up company that provides multi-core programming tools.},
}Downloads
No Downloads available for this publication
Permalink
- Lars Schütze, Jeronimo Castrillon, "Analyzing State-of-the-Art Role-based Programming Languages" , Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming'17), ACM, pp. 9:1–9:6, New York, NY, USA, Apr 2017. [doi] [Bibtex & Downloads]
Analyzing State-of-the-Art Role-based Programming Languages
Reference
Lars Schütze, Jeronimo Castrillon, "Analyzing State-of-the-Art Role-based Programming Languages" , Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming'17), ACM, pp. 9:1–9:6, New York, NY, USA, Apr 2017. [doi]
Bibtex
@InProceedings{schuetze_lassy17,
author = {Lars Sch{\"u}tze and Jeronimo Castrillon},
title = {Analyzing State-of-the-Art Role-based Programming Languages},
booktitle = {Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming'17)},
series = {Programming '17},
year = {2017},
month = apr,
isbn = {978-1-4503-4836-2},
location = {Brussels, Belgium},
pages = {9:1--9:6},
articleno = {9},
numpages = {6},
url = {http://doi.acm.org/10.1145/3079368.3079386},
doi = {10.1145/3079368.3079386},
acmid = {3079386},
publisher = {ACM},
address = {New York, NY, USA},
}Downloads
1704_Schuetze_lassy [PDF]
Permalink
- Fazal Hameed, Jeronimo Castrillon, "Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization" , Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE), EDA Consortium, pp. 362–367, Mar 2017. [doi] [Bibtex & Downloads]
Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization
Reference
Fazal Hameed, Jeronimo Castrillon, "Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization" , Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE), EDA Consortium, pp. 362–367, Mar 2017. [doi]
Abstract
State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization along with energy benefits. However, small row buffers do not fully exploit the temporal locality of tag accesses, leading to reduced Tag- Cache hit rates. As a result, the DRAM cache needs to be re-designed for small row buffer organization to achieve additional performance benefits. In this paper, we propose a novel tag-store mechanism that improves the Tag-Cache hit rate by 70% compared to existing DRAM tag-store mechanisms employing small row buffer organization. In addition, we enhance the DRAM cache controller with novel policies that take into account the locality characteristics of cache accesses. We evaluate our novel tag-store mechanism and controller policies in an 8-core system running the SPEC2006 benchmark and compare their performance and energy consumption against recent proposals. Our architecture improves the average performance by 21.2% and 11.4% respectively compared to large and small row buffer organizations via simultaneously improving both parameters. Compared to DRAM cache with large row buffer organization, we report an energy improvement of 62%.
Bibtex
@InProceedings{hameed_date17,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization},
booktitle = {Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE)},
year = {2017},
series = {DATE '17},
pages = {362--367},
month = mar,
publisher = {EDA Consortium},
abstract = {State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization along with energy benefits. However, small row buffers do not fully exploit the temporal locality of tag accesses, leading to reduced Tag- Cache hit rates. As a result, the DRAM cache needs to be re-designed for small row buffer organization to achieve additional performance benefits. In this paper, we propose a novel tag-store mechanism that improves the Tag-Cache hit rate by 70\% compared to existing DRAM tag-store mechanisms employing small row buffer organization. In addition, we enhance the DRAM cache controller with novel policies that take into account the locality characteristics of cache accesses. We evaluate our novel tag-store mechanism and controller policies in an 8-core system running the SPEC2006 benchmark and compare their performance and energy consumption against recent proposals. Our architecture improves the average performance by 21.2\% and 11.4\% respectively compared to large and small row buffer organizations via simultaneously improving both parameters. Compared to DRAM cache with large row buffer organization, we report an energy improvement of 62\%.},
isbn = {978-3-9815370-8-6},
doi={10.23919/DATE.2017.7927017},
url = {http://ieeexplore.ieee.org/document/7927017/},
location = {Lausanne, Switzerland}
}Downloads
1703_Hameed_DATE [PDF]
Related Paths
Permalink
- Norman A. Rink, Jeronimo Castrillon, "flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication" , Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017, pp. 15–22, Mar 2017. [Bibtex & Downloads]
flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication
Reference
Norman A. Rink, Jeronimo Castrillon, "flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication" , Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017, pp. 15–22, Mar 2017.
Abstract
Errors in memory are known to be a major cause of system failures. Moreover, it has recently been found that single-error correcting, double-error detecting (SECDED) codes, which are widely used in ECC memory modules, are incapable of handling large fractions of errors that occur in practice. This calls for more powerful error detection measures. However, the higher the number of bit flips that can still be detected as an error, the larger the memory overhead. Cost considerations and the varying needs for reliability of different applications may not always warrant laying down extra hardware to accommodate overheads. Software-implemented error detection offers a flexible alternative. In this work we propose the software-implemented flexMEDiC scheme for detecting errors in the memory system, including main memory, on-chip caches, and load-store queues. It is shown that single and double bit flips are detected by flexMEDiC, and evidence is given that suggests that up to five bit flips within a single data word can still be detected as errors. The average runtime overhead incurred by flexMEDiC is 1.55x.
Bibtex
@InProceedings{rees:2017,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {{flexMEDiC}: flexible {M}emory {E}rror {D}etection by Combined data encoding and duplication},
booktitle = {Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017},
year = {2017},
month = mar,
pages = {15--22},
abstract = {Errors in memory are known to be a major cause of system failures. Moreover, it has recently been found that single-error correcting, double-error detecting (SECDED) codes, which are widely used in ECC memory modules, are incapable of handling large fractions of errors that occur in practice. This calls for more powerful error detection measures. However, the higher the number of bit flips that can still be detected as an error, the larger the memory overhead. Cost considerations and the varying needs for reliability of different applications may not always warrant laying down extra hardware to accommodate overheads. Software-implemented error detection offers a flexible alternative. In this work we propose the software-implemented flexMEDiC scheme for detecting errors in the memory system, including main memory, on-chip caches, and load-store queues. It is shown that single and double bit flips are detected by flexMEDiC, and evidence is given that suggests that up to five bit flips within a single data word can still be detected as errors. The average runtime overhead incurred by flexMEDiC is 1.55x.},
}Downloads
1703_Rink_REES [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Programming for adaptive and energy-efficient computing" , In International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote), Mar 2017. [Bibtex & Downloads]
Programming for adaptive and energy-efficient computing
Reference
Jeronimo Castrillon, "Programming for adaptive and energy-efficient computing" , In International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote), Mar 2017.
Bibtex
@Misc{castrillon2017hp3c,
author = {Castrillon, Jeronimo},
title = {Programming for adaptive and energy-efficient computing},
howpublished = {International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote)},
month = mar,
year = {2017},
location = {Kuala Lumpur, Malaysia}
}Downloads
170323_castrill_hp3c [PDF]
Permalink
- Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks" , In Proceeding: ACM SRC at International Symposium on
Code Generationand Optimization (CGO), Feb 2017. [Bibtex & Downloads]
Optimizing for Data-Parallelism in Kahn Process Networks
Reference
Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks" , In Proceeding: ACM SRC at International Symposium on Code Generationand Optimization (CGO), Feb 2017.
Bibtex
@inproceedings{goens17cgo,
author = {Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Optimizing for Data-Parallelism in Kahn Process Networks},
year = {2017},
month = feb,
booktitle= {ACM SRC at International Symposium on
Code Generationand Optimization (CGO)},
location = {Austin, TX, USA},
}Downloads
1701_Goens_SRCCGO [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "On Mapping to Multi/Manycores" , In 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017. [Bibtex & Downloads]
On Mapping to Multi/Manycores
Reference
Jeronimo Castrillon, "On Mapping to Multi/Manycores" , In 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017.
Bibtex
@Misc{castrillon2017multiprog,
author = {Castrillon, Jeronimo},
title = {On Mapping to Multi/Manycores},
howpublished = {10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk)},
month = jan,
year = {2017},
location = {Stockholm, Sweden}
}Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Flexible and Scalable Dataflow Programming for Manycores" , In Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017. [Bibtex & Downloads]
Flexible and Scalable Dataflow Programming for Manycores
Reference
Jeronimo Castrillon, "Flexible and Scalable Dataflow Programming for Manycores" , In Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017.
Bibtex
@Misc{castrillon2017hipeactut,
author = {Castrillon, Jeronimo},
title = {Flexible and Scalable Dataflow Programming for Manycores},
howpublished = {Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk)},
month = jan,
year = {2017},
location = {Stockholm, Sweden}
}Downloads
No Downloads available for this publication
Related Paths
Permalink
2016
- Norman A. Rink, Jeronimo Castrillon, "Comprehensive Backend Support for Local Memory Fault Tolerance" , Technical report, Technische Universität Dresden, pp. 11, Dec 2016. [Bibtex & Downloads]
Comprehensive Backend Support for Local Memory Fault Tolerance
Reference
Norman A. Rink, Jeronimo Castrillon, "Comprehensive Backend Support for Local Memory Fault Tolerance" , Technical report, Technische Universität Dresden, pp. 11, Dec 2016.
Bibtex
@TechReport{rink_techrep16,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Comprehensive Backend Support for Local Memory Fault Tolerance},
institution = {Technische Universit{\"a}t Dresden},
year = {2016},
month = dec,
issn = {1430-211X},
pages = {11},
url = {https://cfaed.tu-dresden.de/files/user/nrink/tech-report-ro.pdf}
}Downloads
tech-report-ro [PDF]
Related Paths
Permalink
- Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware" , Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016. [Bibtex & Downloads]
The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware
Reference
Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware" , Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016.
Abstract
Future systems based on post-CMOS technologies,
will be wildly heterogeneous, with properties largely unknown today.,
This paper presents our design of a new hardware/software stack to address the,
challenge of preparing software development for such systems.,
It combines well-understood technologies from different areas, e.g., network-on-chips,
capability operating systems, flexible programming models and model checking.,
We describe our approach and provide details on key technologies.Bibtex
@InProceedings{voelp16_pmes,
author = {Marcus V{\"o}lp and Sascha Kl{\"u}ppelholz and Jeronimo Castrillon and Hermann H{\"a}rtig and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andres Goens and Sebastian Haas and Dirk Habich and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Wolfgang Lehner and Linda Leuschner and Matthias Lieber and Siqi Ling and Steffen M{\"a}rcker and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt},
title = {The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware},
booktitle = {Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)},
year = {2016},
address = {Salt Lake City, USA},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Voelp_PMES.pdf},
abstract = {Future systems based on post-CMOS technologies,
will be wildly heterogeneous, with properties largely unknown today.,
This paper presents our design of a new hardware/software stack to address the,
challenge of preparing software development for such systems.,
It combines well-understood technologies from different areas, e.g., network-on-chips,
capability operating systems, flexible programming models and model checking.,
We describe our approach and provide details on key technologies.},
}Downloads
1611_Voelp_PMES [PDF]
Related Paths
Permalink
- Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers" , Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi] [Bibtex & Downloads]
High-Level NoC Model for MPSoC Compilers
Reference
Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers" , Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi]
Bibtex
@InProceedings{menard_norcas16,
author = {Christian Menard and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {High-Level NoC Model for MPSoC Compilers},
booktitle = {Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16)},
year = {2016},
pages={1-6},
doi = {10.1109/NORCHIP.2016.7792876},
series = {NORCAS},
address = {Copenhagen, Denmark},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Menard_NORCAS.pdf}
}Downloads
1611_Menard_NORCAS [PDF]
Related Paths
Permalink
- Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study" , Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi] [Bibtex & Downloads]
Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study
Reference
Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study" , Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi]
Abstract
Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.
Bibtex
@InProceedings{goen_mcsoc16,
author= {Andres Goens and Robert Khasanov and Jeronimo Castrillon and Simon Polstra and Andy Pimentel},
title= {Why Comparing System-level {MPSoC} Mapping Approaches is Difficult: a Case Study},
booktitle= {Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16)},
year= {2016},
address= {Ecole Centrale de Lyon, Lyon, France},
month= sep,
pages = {281-288},
doi = {10.1109/MCSoC.2016.48},
abstract = {Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.},
days= {21},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1609_Goens_MCSoC.pdf}
}Downloads
1609_Goens_MCSoC [PDF]
Related Paths
Permalink
- Benjamin Schiller, Clemens Deusser, Jeronimo Castrillon, Thorsten Strufe, "Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis" , In Journal of Applied Network Science, vol. 1, no. 9, pp. 1–22, Sep 2016. [doi] [Bibtex & Downloads]
Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis
Reference
Benjamin Schiller, Clemens Deusser, Jeronimo Castrillon, Thorsten Strufe, "Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis" , In Journal of Applied Network Science, vol. 1, no. 9, pp. 1–22, Sep 2016. [doi]
Bibtex
@Article{schiller16_jans,
author = {Benjamin Schiller and Clemens Deusser and Jeronimo Castrillon and Thorsten Strufe},
title = {Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis},
journal = {Journal of Applied Network Science},
year = {2016},
volume = {1},
number = {9},
pages = {1--22},
month = sep,
doi = {10.1007/s41109-016-0011-2},
url= {http://dynamic-networks.org/publications/papers/papers/gds-dynamic.pdf}
}Downloads
1607_Schiller_JANS [PDF]
Related Paths
HAEC, Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems" , In IEEE 5G Dresden Summit (invited talk), Sep 2016. [Bibtex & Downloads]
Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems
Reference
Jeronimo Castrillon, "Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems" , In IEEE 5G Dresden Summit (invited talk), Sep 2016.
Bibtex
@Misc{castrillon20165gsummit,
author = {Castrillon, Jeronimo},
title = {Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems},
howpublished = {IEEE 5G Dresden Summit (invited talk)},
month = sep,
year = {2016},
location = {Dresden, Germany},
url= {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/160929_castrillon_5G-summit.pdf}
}Downloads
160929_castrillon_5G-summit [PDF]
Permalink
- Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms" , In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi] [Bibtex & Downloads]
An Optimal Allocation of Memory Buffers for Complex Multicore Platforms
Reference
Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms" , In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi]
Abstract
In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.
In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.Bibtex
@Article{goens_jsa16,
Title={An Optimal Allocation of Memory Buffers for Complex Multicore Platforms},
Author={Goens, Andr\'{e}s and Castrillon, Jeronimo and Odendahl, Maximilian and Leupers, Rainer},
Journal={Journal of Systems Architecture},
volume={66-67},
pages={69--83},
doi={10.1016/j.sysarc.2016.05.002},
publisher={Elsevier},
Year={2016},
month=may,
abstract={In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.
In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.}
}Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Programming Heterogeneous Embedded Systems for IoT" , In Workshop get-togethers toward a sustainable collaboration in IoT (invited talk), Apr 2016. ([link]) [Bibtex & Downloads]
Programming Heterogeneous Embedded Systems for IoT
Reference
Jeronimo Castrillon, "Programming Heterogeneous Embedded Systems for IoT" , In Workshop get-togethers toward a sustainable collaboration in IoT (invited talk), Apr 2016. ([link])
Bibtex
@Misc{castrillon2016tunis,
author={Castrillon, Jeronimo},
title={Programming Heterogeneous Embedded Systems for IoT},
howpublished={Workshop get-togethers toward a sustainable collaboration in IoT (invited talk)},
month=apr,
year={2016},
location={Tunis, Tunisia},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/160418_castrillon_dataflow4IoT.pdf}
}Downloads
160418_castrillon_dataflow4IoT [PDF]
Related Paths
Permalink
- Sven Karol, Norman A. Rink, Bálint Gyapjas, Jeronimo Castrillon, "Fault Tolerance with Aspects: a Feasibility Study" , Proceedings of the 15th International Conference on Modularity, ACM, pp. 66–69, New York, NY, USA, Mar 2016. [doi] [Bibtex & Downloads]
Fault Tolerance with Aspects: a Feasibility Study
Reference
Sven Karol, Norman A. Rink, Bálint Gyapjas, Jeronimo Castrillon, "Fault Tolerance with Aspects: a Feasibility Study" , Proceedings of the 15th International Conference on Modularity, ACM, pp. 66–69, New York, NY, USA, Mar 2016. [doi]
Bibtex
@inproceedings{karol2016faulttolerance,
author={Karol, Sven and Rink, Norman A. and Gyapjas, B\'{a}lint and Castrillon, Jeronimo},
title={Fault Tolerance with Aspects: a Feasibility Study},
booktitle={Proceedings of the 15th International Conference on Modularity},
series={MODULARITY 2016},
year={2016},
pages={66--69},
address={New York, NY, USA},
month={mar},
publisher={ACM},
doi={10.1145/2889443.2889453},
isbn={978-1-4503-3995-7/16/03},
location={M{\'a}laga, Spain},
}Downloads
1603_Karol_Modularity_preprint [PDF]
Related Paths
Permalink
2015
- Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs" , In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim) , Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi] [Bibtex & Downloads]
Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs
Reference
Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs" , In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim) , Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi]
Abstract
Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.
Bibtex
@InProceedings{goens_iess15,
author = {Goens, Andr\'{e}s and Castrillon, Jeronimo},
title = {Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs},
booktitle = {System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523},
year = {2015},
editor = {G{\"o}tz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aur{\'e}lio and Al Faruque, Mohammad Abdullah and Rettberg, Achim},
pages = {116--127},
address = {Foz do Igua{\c{c}}u, Brazil},
month = nov,
publisher = {Springer International Publishing},
doi = {10.1007/978-3-319-90023-0_10},
url = {https://link.springer.com/chapter/10.1007%2F978-3-319-90023-0_10},
isbn={978-3-319-90023-0},
abstract = {Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.},
}Downloads
1511_Goens_IESS [PDF]
Related Paths
Permalink
- Benjamin Schiller, Jeronimo Castrillon, Thorsten Strufe, "Efficient data structures for dynamic graph analysis" , Proceedings of the 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (Lisa O'Conner) , IEEE Computer Society, pp. 497–504, Bangkok, Thailand, Nov 2015. [doi] [Bibtex & Downloads]
Efficient data structures for dynamic graph analysis
Reference
Benjamin Schiller, Jeronimo Castrillon, Thorsten Strufe, "Efficient data structures for dynamic graph analysis" , Proceedings of the 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (Lisa O'Conner) , IEEE Computer Society, pp. 497–504, Bangkok, Thailand, Nov 2015. [doi]
Bibtex
@InProceedings{schiller_sitis15,
Title={Efficient data structures for dynamic graph analysis},
Author={Schiller, Benjamin and Castrillon, Jeronimo and Strufe, Thorsten},
Booktitle={Proceedings of the 11th International Conference on Signal-Image Technology \& Internet-Based Systems (SITIS)},
Year={2015},
Address={Bangkok, Thailand},
Editor={Lisa O'Conner},
Month=nov,
Publisher={IEEE Computer Society},
Series={SITIS 2015},
pages={497--504},
doi={10.1109/SITIS.2015.94}
}Downloads
1511_Schiller_SITIS [PDF]
Related Paths
Orchestration Path, Resilience Path, HAEC
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Improving Code Generation for Software-based Error Detection" , Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015, pp. 16–30, Amsterdam, The Netherlands, Oct 2015. ([link]) [Bibtex & Downloads]
Improving Code Generation for Software-based Error Detection
Reference
Norman A. Rink, Jeronimo Castrillon, "Improving Code Generation for Software-based Error Detection" , Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015, pp. 16–30, Amsterdam, The Netherlands, Oct 2015. ([link])
Bibtex
@InProceedings{rink_ress15,
Title={Improving Code Generation for Software-based Error Detection},
Author={Rink, Norman A. and Castrillon, Jeronimo},
Booktitle={Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015},
Year={2015},
Series={REES 2015},
Address={Amsterdam, The Netherlands},
Month=oct,
Pages={16--30},
}Downloads
1510_Rink_REES [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Analysis and software synthesis of KPN applications" , In Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk), Oct 2015. ([link]) [Bibtex & Downloads]
Analysis and software synthesis of KPN applications
Reference
Jeronimo Castrillon, "Analysis and software synthesis of KPN applications" , In Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk), Oct 2015. ([link])
Abstract
Programming models based on dataflow or process
networks are a good match for streaming
applications, common in the signal processing,
multimedia and automotive domains. In such models,
parallelism is expressed explicitly which makes
them well-suited for programming parallel
machines. Since today's applications are no
longer static, expressive programming models are
needed, such as those based on Kahn Process
Networks (KPNs). In these models, tasks cannot be
handled as black boxes, but have to be analyzed,
profiled and traced to characterize their
behavior. This is especially important in the case
of heterogenous platforms with many processors of
multiple different types. This presentation
describes a tool flow to handle KPN applications
and gives insights into mapping algorithms for
heterogeneous platforms.Bibtex
@Misc{castrillon15_dreams,
Title={Analysis and software synthesis of KPN applications},
Author={Jeronimo Castrillon},
HowPublished={Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk)},
Month=oct,
Year={2015},
Day={22},
Location={Berkeley, CA, USA},
Abstract={Programming models based on dataflow or process
networks are a good match for streaming
applications, common in the signal processing,
multimedia and automotive domains. In such models,
parallelism is expressed explicitly which makes
them well-suited for programming parallel
machines. Since today's applications are no
longer static, expressive programming models are
needed, such as those based on Kahn Process
Networks (KPNs). In these models, tasks cannot be
handled as black boxes, but have to be analyzed,
profiled and traced to characterize their
behavior. This is especially important in the case
of heterogenous platforms with many processors of
multiple different types. This presentation
describes a tool flow to handle KPN applications
and gives insights into mapping algorithms for
heterogeneous platforms.},
url={https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/151022_castrillon_dreams.pdf},
}Downloads
151022_castrillon_dreams [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Dataflow programming for heterogeneous computing systems" , In Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT'15), Oct 2015. ([link]) [Bibtex & Downloads]
Dataflow programming for heterogeneous computing systems
Reference
Jeronimo Castrillon, "Dataflow programming for heterogeneous computing systems" , In Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT'15), Oct 2015. ([link])
Abstract
This tutorial talk starts by introducing new types of heterogeneous systems and their challenges for hardware/software programming stacks. These systems are currently being investigated in the context of the German cluster of excellence Cfaed – ''Center for Advancing Electronics Dresden''. We will then look at dataflow modeling concepts, with emphasis on the dynamic models that are needed to express today's changing workloads. Finally, the talk will introduce methods and algorithms for mapping sets of applications modeled in this way to heterogeneous systems.
Bibtex
@Misc{castrillon15_pacttut,
Title={Dataflow programming for heterogeneous computing systems},
Author={Jeronimo Castrillon},
HowPublished={Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT'15)},
Month=oct,
Year={2015},
Abstract={This tutorial talk starts by introducing new types of heterogeneous systems and their challenges for hardware/software programming stacks. These systems are currently being investigated in the context of the German cluster of excellence Cfaed – ''Center for Advancing Electronics Dresden''. We will then look at dataflow modeling concepts, with emphasis on the dynamic models that are needed to express today's changing workloads. Finally, the talk will introduce methods and algorithms for mapping sets of applications modeled in this way to heterogeneous systems.},
Day={18},
}Downloads
151018_castrillon_dataflow_pacttut [PDF]
Related Paths
Permalink
- Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs" , Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link]) [Bibtex & Downloads]
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
Reference
Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs" , Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link])
Abstract
In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.
Bibtex
@InProceedings{vogt15,
Title={GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs},
Author={Vogt , Markus and Hempel, Gerald and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP)},
Year={2015},
Month=sep,
Series={FSP 2015},
archivePrefix={arXiv},
arxivId={1509.00025},
eprint={1509.00025},
abstract={In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.},
}Downloads
1509_Vogt_FSP [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Orchestration: Turning material breakthroughs into application performance" , In Dresden Microelectronics Academy, (invited talk), Sep 2015. [Bibtex & Downloads]
Orchestration: Turning material breakthroughs into application performance
Reference
Jeronimo Castrillon, "Orchestration: Turning material breakthroughs into application performance" , In Dresden Microelectronics Academy, (invited talk), Sep 2015.
Bibtex
@Misc{castrillon2015dma,
Title={Orchestration: Turning material breakthroughs into application performance},
Author={Castrillon, Jeronimo},
HowPublished={Dresden Microelectronics Academy, (invited talk)},
Month=sep,
Year={2015},
Location={Dresden, Germany},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150918_castrillon_dma.pdf}
}Downloads
150918_castrillon_dma [PDF]
Related Paths
Permalink
- Norman A. Rink, Dmitrii Kuvaiskii, Jeronimo Castrillon, Christof Fetzer, "Compiling for Resilience: the Performance Gap" , Chapter in Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015) (Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer) , IOS Press, vol. 27, pp. 721–730, Edinburgh, Scotland, Sep 2015. [doi] [Bibtex & Downloads]
Compiling for Resilience: the Performance Gap
Reference
Norman A. Rink, Dmitrii Kuvaiskii, Jeronimo Castrillon, Christof Fetzer, "Compiling for Resilience: the Performance Gap" , Chapter in Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015) (Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer) , IOS Press, vol. 27, pp. 721–730, Edinburgh, Scotland, Sep 2015. [doi]
Abstract
In order to perform reliable computations on unreliable hardware, software-based protection mechanisms have been proposed. In this paper we present a compiler infrastructure for software-based code hardening based on encoding. We analyze the trade-off between performance and fault coverage. We look at different code generation strategies that improve the performance of hardened programs by up to 2x while incurring little fault coverage degradation.
Bibtex
@InCollection{rink_erpp2015,
author={Rink, Norman A. and Kuvaiskii, Dmitrii and Castrillon, Jeronimo and Fetzer, Christof},
title={Compiling for Resilience: the Performance Gap},
booktitle={Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015)},
publisher={IOS Press},
year={2015},
editor={Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer},
volume={27},
series={ParCo 2015},
pages={721--730},
address={Edinburgh, Scotland},
month=sep,
abstract={In order to perform reliable computations on unreliable hardware, software-based protection mechanisms have been proposed. In this paper we present a compiler infrastructure for software-based code hardening based on encoding. We analyze the trade-off between performance and fault coverage. We look at different code generation strategies that improve the performance of hardened programs by up to 2x while incurring little fault coverage degradation.},
doi={10.3233/978-1-61499-621-7-721},
}Downloads
No Downloads available for this publication
Related Paths
Orchestration Path, Resilience Path
Permalink
- Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs" , Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad) , SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015. [Bibtex & Downloads]
Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs
Reference
Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs" , Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad) , SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015.
Bibtex
@InProceedings{hempeldsd15,
Title={Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs},
Author={Hempel, Gerald and Vogt, Markus and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session},
Year={2015},
Address={Funchal, Madeira (Portugal)},
Editor={Grosspietsch, Erwin and Kl{\"o}ckner, Konrad},
Month={August},
Publisher={SEA-Publications: SEA-SR-44},
Series={DSD/SEAA 2015},
ISBN={978-3-902457-44-8}
}Downloads
1508_Hempel_DSD [PDF]
Related Paths
Permalink
- Sven Karol, Pietro Incardona, Yaser Afshar, Ivo Sbalzarini, Jeronimo Castrillon, "Towards a Next-Generation Parallel Particle-Mesh Language" , Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI), pp. 15–18, Jul 2015. ([link]) [Bibtex & Downloads]
Towards a Next-Generation Parallel Particle-Mesh Language
Reference
Sven Karol, Pietro Incardona, Yaser Afshar, Ivo Sbalzarini, Jeronimo Castrillon, "Towards a Next-Generation Parallel Particle-Mesh Language" , Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI), pp. 15–18, Jul 2015. ([link])
Bibtex
@InProceedings{karol15,
Title={Towards a Next-Generation Parallel Particle-Mesh Language},
Author={Karol, Sven and Incardona, Pietro and Afshar, Yaser and Sbalzarini, Ivo and Castrillon, Jeronimo},
Booktitle={Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI)},
series={DSLDI'15},
Year={2015},
Month=jul,
pages={15--18},
}Downloads
1507_Karol_PPML [PDF]
Related Paths
Permalink
- Diana Göhringer, Michael Hübner, Jeronimo Castrillon, Cristina Silvano, "ViPES 2015-Preface" , Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 347–347, Jul 2015. [doi] [Bibtex & Downloads]
ViPES 2015-Preface
Reference
Diana Göhringer, Michael Hübner, Jeronimo Castrillon, Cristina Silvano, "ViPES 2015-Preface" , Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 347–347, Jul 2015. [doi]
Bibtex
@InProceedings{gohringer2015vipes,
author={G{\"o}hringer, Diana and H{\"u}bner, Michael and Castrillon, Jeronimo and Silvano, Cristina},
title={ViPES 2015-Preface},
booktitle={Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)},
year={2015},
pages={347--347},
organization={IEEE},
month=jul,
doi={10.1109/SAMOS.2015.7363696},
}Downloads
No Downloads available for this publication
Permalink
- Jeronimo Castrillon, "Portable Libraries and Programming Environments" , In HiPEAC Computing Systems Week, (invited talk), May 2015. [Bibtex & Downloads]
Portable Libraries and Programming Environments
Reference
Jeronimo Castrillon, "Portable Libraries and Programming Environments" , In HiPEAC Computing Systems Week, (invited talk), May 2015.
Bibtex
@Misc{castrillon2015csw,
Title={Portable Libraries and Programming Environments},
Author={Castrillon, Jeronimo},
HowPublished={HiPEAC Computing Systems Week, (invited talk)},
Year={2015},
Month=may,
Location={Oslo, Noway},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150505_castrillon_csw.pdf}
}Downloads
150505_castrillon_csw [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Lothar Thiele, Lars Schorr, Weihua Sheng, Ben Juurlink, Mauricio Alvarez-Mesa, Angela Pohl, Ralph Jessenberger, Victor Reyes, Rainer Leupers, "Multi/Many-core Programming: Where Are We Standing?" , Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), EDA Consortium, pp. 1708–1717, San Jose, CA, USA, Mar 2015. ([link]) [Bibtex & Downloads]
Multi/Many-core Programming: Where Are We Standing?
Reference
Jeronimo Castrillon, Lothar Thiele, Lars Schorr, Weihua Sheng, Ben Juurlink, Mauricio Alvarez-Mesa, Angela Pohl, Ralph Jessenberger, Victor Reyes, Rainer Leupers, "Multi/Many-core Programming: Where Are We Standing?" , Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), EDA Consortium, pp. 1708–1717, San Jose, CA, USA, Mar 2015. ([link])
Bibtex
@inproceedings{Castrillon:2015,
author={Castrillon, Jeronimo and Thiele, Lothar and Schorr, Lars and Sheng, Weihua and Juurlink, Ben and Alvarez-Mesa, Mauricio and Pohl, Angela and Jessenberger, Ralph and Reyes, Victor and Leupers, Rainer},
title={Multi/Many-core Programming: Where Are We Standing?},
booktitle={Proceedings of the 2015 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
series={DATE '15},
year={2015},
Month=mar,
location={Grenoble, France},
pages={1708--1717},
numpages={10},
acmid={2757208},
publisher={EDA Consortium},
address={San Jose, CA, USA},
Bdsk-url-1={http://dl.acm.org/citation.cfm?id=2757012.2757208},
}Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Tools and dataflow-based programming models for heterogeneous MPSoCs" , In Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM'15) in conjunction with the HiPEAC Conference (invited talk), Jan 2015. [Bibtex & Downloads]
Tools and dataflow-based programming models for heterogeneous MPSoCs
Reference
Jeronimo Castrillon, "Tools and dataflow-based programming models for heterogeneous MPSoCs" , In Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM'15) in conjunction with the HiPEAC Conference (invited talk), Jan 2015.
Bibtex
@Misc{castrillon2015pegpum,
Title={Tools and dataflow-based programming models for heterogeneous MPSoCs},
Author={Castrillon, Jeronimo},
HowPublished={Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM'15) in conjunction with the HiPEAC Conference (invited talk)},
Year={2015},
Month=jan,
Location={Amsterdam, The Netherlands},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150121_castrillon_pegpum.pdf}
}Downloads
150121_castrillon_pegpum [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Simulation and Estimation for MPSoC Programming Tools" , In Proceeding: Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO'15), in conjunction with the HiPEAC Conference (keynote), Jan 2015. [Bibtex & Downloads]
Simulation and Estimation for MPSoC Programming Tools
Reference
Jeronimo Castrillon, "Simulation and Estimation for MPSoC Programming Tools" , In Proceeding: Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO'15), in conjunction with the HiPEAC Conference (keynote), Jan 2015.
Bibtex
@InProceedings{castrillon2015rapido,
Title={Simulation and Estimation for MPSoC Programming Tools},
Author={Castrillon, Jeronimo},
Booktitle={Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO'15), in conjunction with the HiPEAC Conference (keynote)},
Year={2015},
Month=jan,
Location={Amsterdam, The Netherlands},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150121_castrillon_rapido.pdf}
}Downloads
150121_castrillon_rapido [PDF]
Related Paths
Permalink
2014
- Jeronimo Castrillon, "Compiler Flow for Processors and Systems" , In Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk), Nov 2014. [Bibtex & Downloads]
Compiler Flow for Processors and Systems
Reference
Jeronimo Castrillon, "Compiler Flow for Processors and Systems" , In Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk), Nov 2014.
Bibtex
@Misc{castrillon2015tunis,
Title={Compiler Flow for Processors and Systems},
Author={Castrillon, Jeronimo},
HowPublished={Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk)},
Year={2014},
Month=nov,
Location={Tunis, Tunisia},
url={https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/141127_castrillon_compilers.pdf}
}Downloads
141127_castrillon_compilers [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Rainer Leupers, "Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap" , Springer, pp. 258, 2014. ([link]) [Bibtex & Downloads]
Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap
Reference
Jeronimo Castrillon, Rainer Leupers, "Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap" , Springer, pp. 258, 2014. ([link])
Bibtex
@Book{castrillon14_springer,
Title={Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap},
Author={Castrillon, Jeronimo and Leupers, Rainer},
Publisher={Springer},
Year={2014},
ISBN={978-3-319-00675-8},
Pages={258}
}Downloads
No Downloads available for this publication
Permalink
- Diandian Zhang, Jeronimo Castrillon, Stefan Schürmans, Gerd Ascheid, Rainer Leupers, and Bart Vanthournout, "System-Level Analysis of MPSoCs with a Hardware Scheduler" , Hershey: IGI Global, pp. 335–367, 2014. [doi] [Bibtex & Downloads]
System-Level Analysis of MPSoCs with a Hardware Scheduler
Reference
Diandian Zhang, Jeronimo Castrillon, Stefan Schürmans, Gerd Ascheid, Rainer Leupers, and Bart Vanthournout, "System-Level Analysis of MPSoCs with a Hardware Scheduler" , Hershey: IGI Global, pp. 335–367, 2014. [doi]
Bibtex
@InBook{zhang2014_inbook,
Title={System-Level Analysis of MPSoCs with a Hardware Scheduler},
Author={Diandian Zhang and Jeronimo Castrillon and Stefan Schürmans and Gerd Ascheid and Rainer Leupers and and Bart Vanthournout},
Chapter={Advancing Embedded Systems and Real-Time Communications with Emerging Technologies},
Editor={Seppo Virtanen},
Pages={335--367},
Publisher={Hershey: IGI Global},
doi={10.4018/978-1-4666-6034-2.ch014},
Year={2014},
}Downloads
No Downloads available for this publication
Permalink