Prof. Dr.-Ing. Jeronimo Castrillon |
||
![]() |
Phone Fax Visitor's Address |
jeronimo.castrillon@tu-dresden.de +49 (0)351 463 42716 +49 (0)351 463 39995 Chair for Compiler Construction |
Jerónimo Castrillón received the Electronics Engineering degree with honors from the Pontificia Bolivariana University in Colombia in 2004, the master degree from the ALaRI Institute in Switzerland in 2006 and the Ph.D. degree (Dr.-Ing.) on Electric Engineering and Information Technology with honors from the RWTH Aachen University in Germany in 2013. From early 2009 to April 2013 Dr. Castrillón was the chief engineer of the chair for Software for Systems on Silicon at the RWTH Aachen University, where he was enrolled as research staff since late 2006. From April 2013 to April 2014 Dr. Castrillón was senior scientific staff in the same institution.
In June 2014, Dr. Castrillón joined the department of computer science of the TU Dresden as professor for compiler construction in the context of the German excellence cluster “Center for Advancing Electronics Dresden” (cfaed). His research interests lie on methodologies, languages, tools and algorithms for programming complex computing systems.
Prof. Castrillón has several international publications and has served as program chair (e.g., Vipes’15) and technical program committee in international conferences and workshops (e.g., LCTES, CASES, DAC, DATE, CODES-ISSS, Computing Frontiers, FPL, ICCS and MCSoC) as well as a reviewer for ACM and IEEE journals among others. Prof. Castrillón is the recipient of numerous awards, including the Swiss Excellence Government Scholarship in 2005, the Intel Doctoral Award in 2012. In 2014 he co-founded Silexica GmbH, a company that provides programming tools for embedded multicore architectures. Since 2017, Prof. Castrillon is a member of the executive committee of the ACM “Future of Computing Academy”.
2020
- Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020. [Bibtex & Downloads]
Achieving Determinism in Adaptive AUTOSAR
Reference
Christian Menard, Andrés Goens, Marten Lohstroh, Jeronimo Castrillon, "Achieving Determinism in Adaptive AUTOSAR" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020.
Bibtex
@InProceedings{menard_date20,
author = {Christian Menard and Andr{\'e}s Goens and Marten Lohstroh and Jeronimo Castrillon},
title = {Achieving Determinism in Adaptive AUTOSAR},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
month = mar,
publisher = {EDA Consortium},
location = {Grenoble, France},
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Robert Khasanov, Jeronimo Castrillon, "Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020. [Bibtex & Downloads]
Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping
Reference
Robert Khasanov, Jeronimo Castrillon, "Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020.
Bibtex
@InProceedings{khasanov_date20,
author = {Robert Khasanov and Jeronimo Castrillon},
title = {Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
month = mar,
publisher = {EDA Consortium},
location = {Grenoble, France},
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020. [Bibtex & Downloads]
Generalized Data Placement Strategies for Racetrack Memories
Reference
Asif Ali Khan, Andrés Goens, Fazal Hameed, Jeronimo Castrillon, "Generalized Data Placement Strategies for Racetrack Memories" (to appear), Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE), EDA Consortium, Mar 2020.
Bibtex
@InProceedings{khan_date20,
author = {Asif Ali Khan and Andr{\'e}s Goens and Fazal Hameed and Jeronimo Castrillon},
title = {Generalized Data Placement Strategies for Racetrack Memories},
booktitle = {Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)},
year = {2020},
series = {DATE '20},
publisher = {EDA Consortium},
location = {Grenoble, France},
month = mar,
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
2019
- Fazal Hameed, Jeronimo Castrillon, "A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 27, no. 10, pp. 2375-2386, Oct 2019. [doi] [Bibtex & Downloads]
A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement
Reference
Fazal Hameed, Jeronimo Castrillon, "A Novel Hybrid DRAM/STT-RAM Last-Level-Cache Architecture for Performance, Energy and Endurance Enhancement" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 27, no. 10, pp. 2375-2386, Oct 2019. [doi]
Abstract
High capacity L4 architectures as Last-Level-Cache (LLC) have been recently introduced between L3-SRAM and off-chip memory. These LLC architectures have either employed DRAM or Spin-Transfer-Torque (STT-RAM) memory technologies. It is a known fact that DRAM LLCs feature a higher energy consumption while STT-RAM LLCs feature a lower write endurance compared to their counterparts. This paper proposes an efficient hybrid DRAM/STT-RAM LLC architecture that exploits the best characteristics offered by the individual memory technologies while mitigating their drawbacks. More precisely, we introduce a novel mechanism for the storage and management of the hybrid LLC tags, and a proactive L3-SRAM writeback policy that combines multiple dirty blocks that are mapped to the same LLC row. Our hybrid architecture reduces LLC interference by having less writeback accesses and row fetches. The endurance is improved by reducing the number of STT-RAM block writes. We show that our LLC architecture reduces the total number of STT-RAM block writes by 78% and improves the average performance by 13% compared to a recently proposed STT- RAM LLC. Compared to the state-of-the-art DRAM LLC, we report an average energy and performance improvement of 24% and 17.1% respectively.
Bibtex
@Article{hameed_tvlsi19,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {A Novel Hybrid {DRAM}/{STT-RAM} {L}ast-{L}evel-{C}ache Architecture for Performance, Energy and Endurance Enhancement},
journal = {IEEE Transactions on Very Large Scale Integration Systems (TVLSI)},
year = {2019},
month = oct,
abstract = {High capacity L4 architectures as Last-Level-Cache (LLC) have been recently introduced between L3-SRAM and off-chip memory. These LLC architectures have either employed DRAM or Spin-Transfer-Torque (STT-RAM) memory technologies. It is a known fact that DRAM LLCs feature a higher energy consumption while STT-RAM LLCs feature a lower write endurance compared to their counterparts. This paper proposes an efficient hybrid DRAM/STT-RAM LLC architecture that exploits the best characteristics offered by the individual memory technologies while mitigating their drawbacks. More precisely, we introduce a novel mechanism for the storage and management of the hybrid LLC tags, and a proactive L3-SRAM writeback policy that combines multiple dirty blocks that are mapped to the same LLC row. Our hybrid architecture reduces LLC interference by having less writeback accesses and row fetches. The endurance is improved by reducing the number of STT-RAM block writes. We show that our LLC architecture reduces the total number of STT-RAM block writes by 78\% and improves the average performance by 13\% compared to a recently proposed STT- RAM LLC. Compared to the state-of-the-art DRAM LLC, we report an average energy and performance improvement of 24\% and 17.1\% respectively.},
volume = {27},
number = {10},
pages = {2375-2386},
numpages = {12pp},
doi={10.1109/TVLSI.2019.2918385},
url = {https://ieeexplore.ieee.org/document/8734763},
}
Downloads
1905_Hameed_TVLSI [PDF]
Related Paths
Permalink
- Lars Schütze, Jeronimo Castrillon, "Efficient Late Binding of Dynamic Function Compositions" , Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, ACM, pp. 141–151, New York, NY, USA, Oct 2019. [doi] [Bibtex & Downloads]
Efficient Late Binding of Dynamic Function Compositions
Reference
Lars Schütze, Jeronimo Castrillon, "Efficient Late Binding of Dynamic Function Compositions" , Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, ACM, pp. 141–151, New York, NY, USA, Oct 2019. [doi]
Bibtex
@InProceedings{lars_sle19,
author = {Lars Sch{\"u}tze and Jeronimo Castrillon},
title = {Efficient Late Binding of Dynamic Function Compositions},
booktitle = {Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering},
year = {2019},
series = {SLE 2019},
address = {New York, NY, USA},
month = oct,
publisher = {ACM},
keywords = {conf},
location = {Athens, Greece},
isbn = {978-1-4503-6981-7},
pages = {141--151},
numpages = {11},
url = {http://doi.acm.org/10.1145/3357766.3359543},
doi = {10.1145/3357766.3359543},
acmid = {3359543},
}
Downloads
1910_Schuetze_SLE [PDF]
Permalink
- Tobias Reiher, Alexander Senier, Jeronimo Castrillon, Thorsten Strufe, "RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers" , In Proceeding: International Conference on Formal Aspects of Component Software, pp. 21pp, Oct 2019. [Bibtex & Downloads]
RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers
Reference
Tobias Reiher, Alexander Senier, Jeronimo Castrillon, Thorsten Strufe, "RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers" , In Proceeding: International Conference on Formal Aspects of Component Software, pp. 21pp, Oct 2019.
Bibtex
@InProceedings{reiher_facs19,
author = {Tobias Reiher and Alexander Senier and Jeronimo Castrillon and Thorsten Strufe},
title = {RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers},
booktitle = {International Conference on Formal Aspects of Component Software},
year = {2019},
month = oct,
pages = {21pp},
url = {https://arxiv.org/abs/1910.02146},
organization = {Springer},
location = {Amsterdam, The Netherlands},
}
Downloads
1910_Reiher_FACS [PDF]
Permalink
- Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems" (to appear), Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019), pp. 26pp, Oct 2019. [Bibtex & Downloads]
Reactors: A Deterministic Model for Composable Reactive Systems
Reference
Marten Lohstroh, Íñigo Íncer Romero, Andrés Goens, Patricia Derler, Jeronimo Castrillon, Edward A. Lee, Alberto Sangiovanni-Vincentelli, "Reactors: A Deterministic Model for Composable Reactive Systems" (to appear), Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019), pp. 26pp, Oct 2019.
Bibtex
@InProceedings{Lohstroh-cyphy19,
author = {Marten Lohstroh and {\'I}{\~n}igo {\'I}ncer Romero and Andr\'{e}s Goens and Patricia Derler and Jeronimo Castrillon and Edward A. Lee and Alberto Sangiovanni-Vincentelli},
title = {Reactors: A Deterministic Model for Composable Reactive Systems},
booktitle = {Proceedings of the 9th Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and Cyber-Physical Systems Education (WESE 2019)},
year = {2019},
month = oct,
pages = {26pp},
location = {New York City, NY, USA}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Embedded manycore programming: From auto-parallelization to domain specific languages" , In IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote), Oct 2019. [Bibtex & Downloads]
Embedded manycore programming: From auto-parallelization to domain specific languages
Reference
Jeronimo Castrillon, "Embedded manycore programming: From auto-parallelization to domain specific languages" , In IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote), Oct 2019.
Abstract
Programming manycores remains a daunting task, especially in the presence of the heterogeneity and application constraints typical in the embedded domain. This talk reviews efforts to cope with this complexity from the last 10+ years of research. It starts with the challenges faced by auto-parallelizing compilers, discussing how far they have made it since the start of the multi-core era. The talk also reviews explicit parallel programming and associated programming methodologies, with focus on recent advances that aim at increasing the adaptivity and robustness of dataflow applications. The talk then advocates for even higher-level programming abstractions in the form of domain specific languages, particularly important to deal with the increased complexity brought by emerging computing paradigms.
Bibtex
@Misc{castrillon_mcsoc2019,
author = {Castrillon, Jeronimo},
title = {Embedded manycore programming: From auto-parallelization to domain specific languages},
howpublished = {IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) (keynote)},
month = oct,
year = {2019},
abstract = {Programming manycores remains a daunting task, especially in the presence of the heterogeneity and application constraints typical in the embedded domain. This talk reviews efforts to cope with this complexity from the last 10+ years of research. It starts with the challenges faced by auto-parallelizing compilers, discussing how far they have made it since the start of the multi-core era. The talk also reviews explicit parallel programming and associated programming methodologies, with focus on recent advances that aim at increasing the adaptivity and robustness of dataflow applications. The talk then advocates for even higher-level programming abstractions in the form of domain specific languages, particularly important to deal with the increased complexity brought by emerging computing paradigms.},
url = {http://mcsoc-forum.org/m2019/wp-content/uploads/2019/10/191002_castrillon_mcsoc-opt.pdf},
keywords = {invitedtalk},
location = {Singapore},
}
Downloads
191002_castrillon_mcsoc-opt [PDF]
Permalink
- Jeronimo Castrillon, "Dataflow and higher level abstractions for parallel programming" , In CPS Summer School 2019: Designing Cyber-Physical Systems - From concepts to implementation (keynote), Sep 2019. [Bibtex & Downloads]
Dataflow and higher level abstractions for parallel programming
Reference
Jeronimo Castrillon, "Dataflow and higher level abstractions for parallel programming" , In CPS Summer School 2019: Designing Cyber-Physical Systems - From concepts to implementation (keynote), Sep 2019.
Abstract
Computing systems continue to increase in complexity, today including multiple cores, complex memory hierarchies and domain-specific accelerators, and soon with components built with emerging hardware technologies. This complexity calls for advances in a variety of domains, like programming and modeling languages, models of hardware, system simulators, design exploration methodologies and hardware architectures. From the standpoint of programming languages and compilers, this lecture discusses the challenges in mainstream sequential programming to motivate higher-level abstractions. It then provides an introduction to dataflow programming methodologies as a promising solution for embedded applications. We will review the fundamentals of dataflow models of computation, basic programming methodologies and look at current research to account for the adaptivity that new applications require, especially in the context of cyber physical systems. The lecture closes with an outlook on higher level programming abstractions and challenges posed by emerging computing architectures.
Bibtex
@Misc{castrillon_cpss19,
author = {Castrillon, Jeronimo},
title = {Dataflow and higher level abstractions for parallel programming},
howpublished = CPS Summer School 2019: {Designing Cyber-Physical Systems - From concepts to implementation (keynote)},
month = sep,
year = {2019},
abstract = {Computing systems continue to increase in complexity, today including multiple cores, complex memory hierarchies and domain-specific accelerators, and soon with components built with emerging hardware technologies. This complexity calls for advances in a variety of domains, like programming and modeling languages, models of hardware, system simulators, design exploration methodologies and hardware architectures. From the standpoint of programming languages and compilers, this lecture discusses the challenges in mainstream sequential programming to motivate higher-level abstractions. It then provides an introduction to dataflow programming methodologies as a promising solution for embedded applications. We will review the fundamentals of dataflow models of computation, basic programming methodologies and look at current research to account for the adaptivity that new applications require, especially in the context of cyber physical systems. The lecture closes with an outlook on higher level programming abstractions and challenges posed by emerging computing architectures.},
keywords = {invitedtalk},
location = {Alghero, Sardinia, Italy},
project = {cfaed, haec},
url = {http://www.cpsschool.eu/dataflow-and-higher-level-abstractions-for-parallel-programming/}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi] [Bibtex & Downloads]
STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism
Reference
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell, ACM, pp. 146–161, New York, NY, USA, Aug 2019. [doi]
Abstract
Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution. Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations. In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.
Bibtex
@InProceedings{ertel_haskell19,
author = {Ertel, Sebastian and Adam, Justus and Rink, Norman A. and Goens, Andr{\'e}s and Castrillon, Jeronimo},
title = {{STCLang}: State Thread Composition as a Foundation for Monadic Dataflow Parallelism},
booktitle = {Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell},
year = {2019},
series = {Haskell 2019},
pages = {146--161},
address = {New York, NY, USA},
month = aug,
publisher = {ACM},
abstract = {Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition of stateful computations and still exposes opportunities for parallelization. We also introduce smap, a higher-order function that can exploit parallelism in stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic concepts from functional reactive programming can be built on top of our programming model with little effort. We compare these implementations to a state-of-the-art approach using monad-par and LVars to expose parallelism explicitly and reach the same level of performance, showing that our programming model successfully extracts parallelism that is present in an algorithm. Further evaluation shows that smap is expressive enough to implement parallel reductions and our programming model resolves short-comings of the stream-based programming model for current state-of-the-art big data processing systems.},
acmid = {3342600},
doi = {10.1145/3331545.3342600},
isbn = {978-1-4503-6813-1},
keywords = {conf},
location = {Berlin, Germany},
numpages = {16},
url = {http://doi.acm.org/10.1145/3331545.3342600}
}
Downloads
1908_Ertel_Haskell [PDF]
Related Paths
Permalink
- Joonas Multanen, Asif Ali Khan, Pekka Jääskeläinen, Fazal Hameed, Jeronimo Castrillon, "SHRIMP: Efficient Instruction Delivery with Domain Wall Memory" , Proceedings of the International Symposium on Low Power Electronics and Design, ACM, pp. 6pp, New York, NY, USA, Jul 2019. [doi] [Bibtex & Downloads]
SHRIMP: Efficient Instruction Delivery with Domain Wall Memory
Reference
Joonas Multanen, Asif Ali Khan, Pekka Jääskeläinen, Fazal Hameed, Jeronimo Castrillon, "SHRIMP: Efficient Instruction Delivery with Domain Wall Memory" , Proceedings of the International Symposium on Low Power Electronics and Design, ACM, pp. 6pp, New York, NY, USA, Jul 2019. [doi]
Bibtex
@InProceedings{multanen_islped19,
author = {Joonas Multanen and Asif Ali Khan and Pekka J{\"a}{\"a}skel{\"a}inen and Fazal Hameed and Jeronimo Castrillon},
title = {{SHRIMP}: Efficient Instruction Delivery with Domain Wall Memory},
booktitle = {Proceedings of the International Symposium on Low Power Electronics and Design},
year = {2019},
month = jul,
series = {ISLPED '19},
location = {Lausanne, Switzerland},
pages = {6pp},
numpages = {6},
publisher = {ACM},
address = {New York, NY, USA},
doi={10.1109/ISLPED.2019.8824954},
url = {https://ieeexplore.ieee.org/document/8824954},
}
Downloads
1907_Multanen_ISLPED [PDF]
Related Paths
Permalink
- Andrés Goens, Christian Menard, Jeronimo Castrillon, "On compact mappings for multicore systems" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung) , Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi] [Bibtex & Downloads]
On compact mappings for multicore systems
Reference
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On compact mappings for multicore systems" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS) (D. Pnevmatikatos and M. Pelcat and M. Jung) , Springer, Cham, vol. 11733, pp. 325–335, Jul 2019. [doi]
Bibtex
@InProceedings{goens_samos19,
author = {Andr{\'e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On compact mappings for multicore systems},
booktitle = {Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS)},
year = {2019},
editor = {D. Pnevmatikatos and M. Pelcat and M. Jung},
volume = {11733},
pages = {325--335},
month = jul,
organization = {IEEE},
publisher = {Springer, Cham},
doi = {10.1007/978-3-030-27562-4_23},
isbn = {978-3-030-27561-7},
location = {Pythagorion, Greece},
numpages = {11},
url = {https://link.springer.com/chapter/10.1007/978-3-030-27562-4_23}
}
Downloads
1907_Goens_SAMOS [PDF]
Related Paths
Permalink
- Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads" , Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES), ACM, pp. 5–18, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads
Reference
Asif Ali Khan, Norman A. Rink, Fazal Hameed, Jeronimo Castrillon, "Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads" , Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES), ACM, pp. 5–18, New York, NY, USA, Jun 2019. [doi]
Abstract
Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.
Bibtex
@InProceedings{kahn_lctes19,
author = {Asif Ali Khan and Norman A. Rink and Fazal Hameed and Jeronimo Castrillon},
title = {Optimizing Tensor Contractions for Embedded Devices with Racetrack Memory Scratch-Pads},
booktitle = {Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory of Embedded Systems (LCTES)},
series = {LCTES 2019},
pages = {5--18},
numpages = {12},
numpages = {14},
isbn = {978-1-4503-6724-0/19/06},
doi = {10.1145/3316482.3326351},
url = {http://doi.acm.org/10.1145/3316482.3326351},
acmid = {3326351},
year = {2019},
month = jun,
location = {Phoenix, AZ, USA},
publisher = {ACM},
address = {New York, NY, USA},
abstract = {Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.},
acmid = {3326351},
}
Downloads
1906_Khan_LCTES [PDF]
Related Paths
Permalink
- Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A case study on machine learning for synthesizing benchmarks" , Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
A case study on machine learning for synthesizing benchmarks
Reference
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, Jeronimo Castrillon, "A case study on machine learning for synthesizing benchmarks" , Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), ACM, pp. 38–46, New York, NY, USA, Jun 2019. [doi]
Abstract
Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.
Bibtex
@InProceedings{goens_mapl19,
author = {Andr\'{e}s Goens and Alexander Brauckmann and Sebastian Ertel and Chris Cummins and Hugh Leather and Jeronimo Castrillon},
title = {A case study on machine learning for synthesizing benchmarks},
booktitle = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL)},
year = {2019},
series = {MAPL 2019},
doi = {10.1145/3315508.3329976},
url = {http://doi.acm.org/10.1145/3315508.3329976},
acmid = {3329976},
isbn = {978-1-4503-6719-6/19/06},
pages = {38--46},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
keywords = {conf},
location = {Phoenix, AZ, USA},
numpages = {9},
abstract = {Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.},
}
Downloads
1906_Goens_MAPL [PDF]
Permalink
- Norman A. Rink, Jeronimo Castrillon, "TeIL: a type-safe imperative Tensor Intermediate Language" , Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY), ACM, pp. 57–68, New York, NY, USA, Jun 2019. [doi] [Bibtex & Downloads]
TeIL: a type-safe imperative Tensor Intermediate Language
Reference
Norman A. Rink, Jeronimo Castrillon, "TeIL: a type-safe imperative Tensor Intermediate Language" , Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY), ACM, pp. 57–68, New York, NY, USA, Jun 2019. [doi]
Abstract
Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels. The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL s formal semantics we develop a type-safety result in the Coq proof assistant.
Bibtex
@InProceedings{rink_array19,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {{TeIL}: a type-safe imperative {Tensor Intermediate Language}},
booktitle = {Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY)},
year = {2019},
series = {ARRAY 2019},
pages = {57--68},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
doi = {10.1145/3315454.3329959},
url = {http://doi.acm.org/10.1145/3315454.3329959},
acmid = {3329959},
isbn = {978-1-4503-6717-2/19/06},
location = {Phoenix, AZ, USA},
numpages = {12},
abstract = {Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels.
The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a type-safety result in the Coq proof assistant.},
}
Downloads
1906_Rink_Array [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "SoC programming in the era of the Internet of Things, machine learning and emerging technologies" , In Groupement De Recherche SOC2: System On Chip, Systèmes embarqués et Objets Connecté (keynote), Jun 2019. [Bibtex & Downloads]
SoC programming in the era of the Internet of Things, machine learning and emerging technologies
Reference
Jeronimo Castrillon, "SoC programming in the era of the Internet of Things, machine learning and emerging technologies" , In Groupement De Recherche SOC2: System On Chip, Systèmes embarqués et Objets Connecté (keynote), Jun 2019.
Abstract
The design of a system on chip has traditionally been one of the most complex tasks in computing systems. Designers have to deal with stringent application constraints under a low-power budget while reducing non-recurring engineering costs. Modeling languages, costs models of hardware, system simulators, design exploration methodologies and alike have made it possible to cope with this high complexity. Today, three recent trends represent a non trivial complexity increase and thus a challenge for SoC designers and programmers, namely, 1) additional system dynamics in the context of the Internet of Things, 2) the ubiquity of machine learning workloads, and 3) the added complexity brought by specialization and emerging technologies. This talk discusses how models and higher level programming abstractions can be leveraged to cope with these trends. A dataflow programming methodology is extended to account for dynamic execution scenarios at runtime. A tensor abstraction, common in machine learning, is introduced that eases programming and design tasks. Finally, the talk shows how the tensor abstraction is useful to efficiently map tensor computations to SoCs with non-volatile racetrack scratch-pad memory.
Bibtex
@Misc{castrillon_gdrsoc2019,
author = {Castrillon, Jeronimo},
title = {SoC programming in the era of the Internet of Things, machine learning and emerging technologies},
howpublished = {Groupement De Recherche SOC2: System On Chip, Syst{\`e}mes embarqu{\'e}s et Objets Connect{\'e} (keynote)},
month = jun,
year = {2019},
abstract = {The design of a system on chip has traditionally been one of the most complex tasks in computing
systems. Designers have to deal with stringent application constraints under a low-power budget while
reducing non-recurring engineering costs. Modeling languages, costs models of hardware, system
simulators, design exploration methodologies and alike have made it possible to cope with this high
complexity. Today, three recent trends represent a non trivial complexity increase and thus a challenge
for SoC designers and programmers, namely, 1) additional system dynamics in the context of the
Internet of Things, 2) the ubiquity of machine learning workloads, and 3) the added complexity brought
by specialization and emerging technologies. This talk discusses how models and higher level
programming abstractions can be leveraged to cope with these trends. A dataflow programming
methodology is extended to account for dynamic execution scenarios at runtime. A tensor abstraction,
common in machine learning, is introduced that eases programming and design tasks. Finally, the
talk shows how the tensor abstraction is useful to efficiently map tensor computations to SoCs with
non-volatile racetrack scratch-pad memory.},
location = {Montpellier, France},
url = {http://www.gdr-soc.cnrs.fr/programme-colloque-2019/}
}
Downloads
190620_castrillon_gdrsoc2-lowres [PDF]
Permalink
- Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , In CoRR, vol. abs/1906.12098, Jun 2019. [Bibtex & Downloads]
Category-Theoretic Foundations of STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism
Reference
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon, "Category-Theoretic Foundations of STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism" , In CoRR, vol. abs/1906.12098, Jun 2019.
Bibtex
@Article{ertel_haskellsup19,
author = {Sebastian Ertel and Justus Adam and Norman A. Rink and Andr{\'{e}}s Goens and Jeronimo Castrillon},
title = {Category-Theoretic Foundations of ``STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism''},
journal = {CoRR},
year = {2019},
volume = {abs/1906.12098},
month = jun,
archiveprefix = {arXiv},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1906-12098},
eprint = {1906.12098},
url = {http://arxiv.org/abs/1906.12098}
}
Downloads
1906_Ertel_Haskellsupp [PDF]
Related Paths
Permalink
- Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0" , In ArXiv arXiv:1903.03597, Mar 2019. [Bibtex & Downloads]
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0
Reference
Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0" , In ArXiv arXiv:1903.03597, Mar 2019.
Abstract
Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This paper presents data placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%.
Bibtex
@Article{kahn_arxiv-taco19,
author = {Asif Ali Khan and Fazal Hameed and Robin Bl{\"a}sing and Stuart Parkin and Jeronimo Castrillon},
title = {ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0},
journal = {ArXiv arXiv:1903.03597},
year = {2019},
month = mar,
abstract = {Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This paper presents data placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%.},
url = {https://arxiv.org/abs/1903.03597},
}
Downloads
1903_khan_ArXiv-TACO [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Programming abstractions: When domain-specific goes mainstream" , In 28th Workshop of the Gesellschaft für Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS 19) (invited talk), Mar 2019. [Bibtex & Downloads]
Programming abstractions: When domain-specific goes mainstream
Reference
Jeronimo Castrillon, "Programming abstractions: When domain-specific goes mainstream" , In 28th Workshop of the Gesellschaft für Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS 19) (invited talk), Mar 2019.
Abstract
We have seen several inflection points in computing in this century: from single to multi-core, from homogeneous to heterogeneous, and in the near future to fundamentally new computing paradigms with emerging technologies. With domain-specific hardware becoming mainstream, and programming reaching out to professions outside computer science, higher-level programming abstractions, domain-specific languages (DSLs) and tools are badly needed. This talk provides examples of such programming abstractions as a basis for discussion. It first discusses how dataflow programming models from the embedded domain can be leveraged in more general purpose setups. It then presents DSLs for particle-based simulations and for tensor expressions. The latter is one example of multiple tensor DSLs available today, spawned by the recent machine learning boom. The talk closes with examples of emerging technologies and a brief discussion about how they may impact our current assumptions.
Bibtex
@Misc{castrillon_pars2019,
author = {Castrillon, Jeronimo},
title = {Programming abstractions: When domain-specific goes mainstream},
howpublished = {28th Workshop of the Gesellschaft f{\"u}r Informatik, interest group Parallele Algorithmen, Rechenstrukturen und Systemsoftware (PARS'19) (invited talk)},
month = mar,
year = {2019},
abstract = {We have seen several inflection points in computing in this century: from single to multi-core,
from homogeneous to heterogeneous, and in the near future to fundamentally new computing
paradigms with emerging technologies. With domain-specific hardware becoming mainstream,
and programming reaching out to professions outside computer science, higher-level
programming abstractions, domain-specific languages (DSLs) and tools are badly needed. This
talk provides examples of such programming abstractions as a basis for discussion. It first
discusses how dataflow programming models from the embedded domain can be leveraged in
more general purpose setups. It then presents DSLs for particle-based simulations and for tensor
expressions. The latter is one example of multiple tensor DSLs available today, spawned
by the recent machine learning boom. The talk closes with examples of emerging technologies
and a brief discussion about how they may impact our current assumptions.},
location = {Berlin, Germany},
url = {https://fg-pars.gi.de/veranstaltung/pars-workshop-2019/}
}
Downloads
190321_castrill_PARS-sent [PDF]
Permalink
- Gerhard Fettweis, Meik Dörpinghaus, Jeronimo Castrillon, Akash Kumar, Christel Baier, Karlheinz Bock, Frank Ellinger, Andreas Fery, Frank H. P. Fitzek, Hermann Härtig, Kambiz Jamshidi, Thomas Kissinger, Wolfgang Lehner, Michael Mertig, Wolfgang E. Nagel, Giang T. Nguyen, Dirk Plettemeier, Michael Schröter, Thorsten Strufe, "Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing" , In Proceedings of the IEEE, vol. 107, no. 1, pp. 204–231, Jan 2019. [doi] [Bibtex & Downloads]
Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing
Reference
Gerhard Fettweis, Meik Dörpinghaus, Jeronimo Castrillon, Akash Kumar, Christel Baier, Karlheinz Bock, Frank Ellinger, Andreas Fery, Frank H. P. Fitzek, Hermann Härtig, Kambiz Jamshidi, Thomas Kissinger, Wolfgang Lehner, Michael Mertig, Wolfgang E. Nagel, Giang T. Nguyen, Dirk Plettemeier, Michael Schröter, Thorsten Strufe, "Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing" , In Proceedings of the IEEE, vol. 107, no. 1, pp. 204–231, Jan 2019. [doi]
Bibtex
@Article{fettweis_ieeeproc19,
author = {Gerhard Fettweis and Meik D{\"o}rpinghaus and Jeronimo Castrillon and Akash Kumar and Christel Baier and Karlheinz Bock and Frank Ellinger and Andreas Fery and Frank H. P. Fitzek and Hermann H{\"a}rtig and Kambiz Jamshidi and Thomas Kissinger and Wolfgang Lehner and Michael Mertig and Wolfgang E. Nagel and Giang T. Nguyen and Dirk Plettemeier and Michael Schr{\"o}ter and Thorsten Strufe},
title = {Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing},
journal = {Proceedings of the IEEE},
year = {2019},
volume = {107},
number = {1},
pages = {204--231},
month = jan,
doi = {10.1109/JPROC.2018.2874895},
issn = {0018-9219},
url = {https://ieeexplore.ieee.org/document/8565890}
}
Downloads
1812_Fettweis_IEEEProc [PDF]
Related Paths
Permalink
- Hasna Bouraoui, Jeronimo Castrillon, Chadlia Jerad, "Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications" , Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 4:1–4:6, New York, NY, USA, Jan 2019. [doi] [Bibtex & Downloads]
Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications
Reference
Hasna Bouraoui, Jeronimo Castrillon, Chadlia Jerad, "Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications" , Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 4:1–4:6, New York, NY, USA, Jan 2019. [doi]
Bibtex
@InProceedings{bouraoui_parma19,
author = {Hasna Bouraoui and Jeronimo Castrillon and Chadlia Jerad},
title = {Comparing Dataflow and OpenMP Programming for Speaker Recognition Applications},
booktitle = {Proceedings of the 10th Workshop and 8th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'19), co-located with 14th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2019},
series = {PARMA-DITAM 2019},
pages = {4:1--4:6},
articleno = {4},
numpages = {6},
address = {New York, NY, USA},
month = jan,
publisher = {ACM},
isbn = {978-1-4503-6321-1},
url = {http://doi.acm.org/10.1145/3310411.3310417},
doi = {10.1145/3310411.3310417},
acmid = {3310417},
location = {Valencia, Spain},
numpages = {6}
}
Downloads
1901_Bouraoui_PARMA [PDF]
Related Paths
Permalink
- Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "RTSim: A Cycle-accurate Simulator for Racetrack Memories" , In IEEE Computer Architecture Letters, IEEE, vol. 18, no. 1, pp. 43–46, Jan 2019. [doi] [Bibtex & Downloads]
RTSim: A Cycle-accurate Simulator for Racetrack Memories
Reference
Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart Parkin, Jeronimo Castrillon, "RTSim: A Cycle-accurate Simulator for Racetrack Memories" , In IEEE Computer Architecture Letters, IEEE, vol. 18, no. 1, pp. 43–46, Jan 2019. [doi]
Bibtex
@Article{khan_ieeecal19,
author = {Asif Ali Khan and Fazal Hameed and Robin Bl{\"a}sing and Stuart Parkin and Jeronimo Castrillon},
title = {{RTS}im: A Cycle-accurate Simulator for Racetrack Memories},
journal = {IEEE Computer Architecture Letters},
year = {2019},
volume = {18},
number = {1},
pages = {43--46},
month = jan,
doi = {10.1109/LCA.2019.2899306},
issn = {1556-6056},
publisher = {IEEE},
url = {https://ieeexplore.ieee.org/document/8642352}
}
Downloads
1902_khan_IEEECAL [PDF]
Related Paths
Permalink
2018
- Adilla Susungi, Norman A. Rink, Albert Cohen, Jeronimo Castrillon, Claude Tadonki, "Meta-programming for cross-domain tensor optimizations" , Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 18), ACM, pp. 79–92, New York, NY, USA, Nov 2018. [doi] [Bibtex & Downloads]
Meta-programming for cross-domain tensor optimizations
Reference
Adilla Susungi, Norman A. Rink, Albert Cohen, Jeronimo Castrillon, Claude Tadonki, "Meta-programming for cross-domain tensor optimizations" , Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 18), ACM, pp. 79–92, New York, NY, USA, Nov 2018. [doi]
Bibtex
@InProceedings{rink_gpce18,
author = {Adilla Susungi and Norman A. Rink and Albert Cohen and Jeronimo Castrillon and Claude Tadonki},
title = {Meta-programming for cross-domain tensor optimizations},
booktitle = {Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'18)},
year = {2018},
series = {GPCE 2018},
pages = {79--92},
numpages = {14},
address = {New York, NY, USA},
month = nov,
publisher = {ACM},
keywords = {conf},
location = {Boston, MA, USA},
isbn = {978-1-4503-6045-6},
url = {http://doi.acm.org/10.1145/3278122.3278131},
doi = {10.1145/3278122.3278131},
acmid = {3278131},
}
Downloads
1811_Rink_GPCE [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Parallel programming methodologies for manycores" , In NeXtream Solution Seminar & Silexica Technology Workshop (invited talk), Oct 2018. [Bibtex & Downloads]
Parallel programming methodologies for manycores
Reference
Jeronimo Castrillon, "Parallel programming methodologies for manycores" , In NeXtream Solution Seminar & Silexica Technology Workshop (invited talk), Oct 2018.
Bibtex
@Misc{castrillon_neXtream2018,
author = {Castrillon, Jeronimo},
title = {Parallel programming methodologies for manycores},
howpublished = {NeXtream Solution Seminar \& Silexica Technology Workshop (invited talk)},
month = oct,
year = {2018},
keywords = {invitedtalk},
location = {Tokyo, Japan},
url = {https://nextream.bz/nss/2018/?page_id=29}
}
Downloads
181017_castrill_slx-tokyo_sent-2 [PDF]
Permalink
- Rainer Leupers, Miguel A. Aguilar, Jeronimo Castrillon, Weihua Sheng, "Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems" , Chapter in Handbook of Signal Processing Systems (3rd Edition) (Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo) , Springer New York, pp. 1021–1062, Sep 2018. [doi] [Bibtex & Downloads]
Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems
Reference
Rainer Leupers, Miguel A. Aguilar, Jeronimo Castrillon, Weihua Sheng, "Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems" , Chapter in Handbook of Signal Processing Systems (3rd Edition) (Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo) , Springer New York, pp. 1021–1062, Sep 2018. [doi]
Abstract
The increasing demands of modern embedded systems, such as high-performance and energy-efficiency, have motivated the use of heterogeneous multi-core platforms enabled by Multiprocessor System-on-Chips (MPSoCs). To fully exploit the power of these platforms, new tools are needed to address the increasing software complexity to achieve a high productivity. An MPSoC compiler is a tool-chain to tackle the problems of application modeling, platform description, software parallelization, software distribution and code generation for an efficient usage of the target platform. This chapter discusses various aspects of compilers for heterogeneous embedded multi-core systems, using the well-established single-core C compiler technology as a baseline for comparison. After a brief introduction to the MPSoC compiler technology, the important ingredients of the compilation process are explained in detail. Finally, a number of case studies from academia and industry are presented to illustrate the concepts discussed in this chapter.
Bibtex
@InCollection{leupers18_spschapter,
author = {Leupers, Rainer and Aguilar, Miguel A. and Castrillon, Jeronimo and Sheng, Weihua},
title = {Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems},
booktitle = {Handbook of Signal Processing Systems (3rd Edition)},
publisher = {Springer New York},
year = {2018},
month = sep,
editor = {Bhattacharyya, Shuvra S. and Deprettere, Ed F. and Leupers, Rainer and Takala, Jarmo},
pages = {1021--1062},
abstract = {The increasing demands of modern embedded systems, such as high-performance and energy-efficiency, have motivated the use of heterogeneous multi-core platforms enabled by Multiprocessor System-on-Chips (MPSoCs). To fully exploit the power of these platforms, new tools are needed to address the increasing software complexity to achieve a high productivity. An MPSoC compiler is a tool-chain to tackle the problems of application modeling, platform description, software parallelization, software distribution and code generation for an efficient usage of the target platform. This chapter discusses various aspects of compilers for heterogeneous embedded multi-core systems, using the well-established single-core C compiler technology as a baseline for comparison. After a brief introduction to the MPSoC compiler technology, the important ingredients of the compilation process are explained in detail. Finally, a number of case studies from academia and industry are presented to illustrate the concepts discussed in this chapter.},
doi = {10.1007/978-3-319-91734-4_28},
isbn = {978-3-319-91733-7},
url = {https://link.springer.com/chapter/10.1007/978-3-319-91734-4_28},
}
Downloads
1809_Leupers_SPSBookChapter [PDF]
Permalink
- Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores" , Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi] [Bibtex & Downloads]
On the Representation of Mappings to Multicores
Reference
Andrés Goens, Christian Menard, Jeronimo Castrillon, "On the Representation of Mappings to Multicores" , Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18), pp. 184–191, Vietnam National University, Hanoi, Vietnam, Sep 2018. [doi]
Abstract
Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.
Bibtex
@InProceedings{goen_mcsoc18,
author = {Andr\'{e}s Goens and Christian Menard and Jeronimo Castrillon},
title = {On the Representation of Mappings to Multicores},
booktitle = {Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-18)},
year = {2018},
address = {Vietnam National University, Hanoi, Vietnam},
month = sep,
pages = {184--191},
doi = {10.1109/MCSoC2018.2018.00039},
url = {https://ieeexplore.ieee.org/document/8540232},
isbn = {978-1-5386-6689-0/18/},
abstract = {Application requirements for embedded systems are growing rapidly, as is the complexity of systems designed to execute them. A common abstraction used to tame this growing complexity is that of a mapping, which assigns parts of an application to different hardware resources. Modern flows need to explore an intractably large design space of mappings, and be able to quickly find near-optimal mappings for different objectives, sometimes at runtime. With systems featuring thousands of cores in the near horizon, we need methods to make this exploration step truly scalable. In this paper we argue that the mathematical representation of a mapping is central to achieve this. We present different representations and how these could be applied to different contexts and objectives, like complex design- space exploration meta-heuristics or efficient runtime systems.},
}
Downloads
1809_Goens_MCSoC [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems" , In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi] [Bibtex & Downloads]
A Hardware/Software Stack for Heterogeneous Systems
Reference
Jeronimo Castrillon, Matthias Lieber, Sascha Klüppelholz, Marcus Völp, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andrés Goens, Sebastian Haas, Dirk Habich, Hermann Härtig, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Akash Kumar, Wolfgang Lehner, Linda Leuschner, Siqi Ling, Steffen Märcker, Christian Menard, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, Sascha Wunderlich, "A Hardware/Software Stack for Heterogeneous Systems" , In IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 243-259, Jul 2018. [doi]
Abstract
Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.
Bibtex
@Article{castrillon_tmscs17,
author = {Jeronimo Castrillon and Matthias Lieber and Sascha Kl{\"u}ppelholz and Marcus V{\"o}lp and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andr\'{e}s Goens and Sebastian Haas and Dirk Habich and Hermann H{\"a}rtig and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Akash Kumar and Wolfgang Lehner and Linda Leuschner and Siqi Ling and Steffen M{\"a}rcker and Christian Menard and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt and Sascha Wunderlich},
title = {A Hardware/Software Stack for Heterogeneous Systems},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
year = {2018},
month = jul,
volume={4},
number={3},
pages={243-259},
abstract = {Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.},
doi = {10.1109/TMSCS.2017.2771750},
issn = {2332-7766},
url = {http://ieeexplore.ieee.org/document/8103042/}
}
Downloads
1711_Castrillon_TMSCS [PDF]
Related Paths
Permalink
- Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Performance and Energy Efficient Design of STT-RAM Last-Level-Cache" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 26, no. 6, pp. 1059–1072, Jun 2018. [doi] [Bibtex & Downloads]
Performance and Energy Efficient Design of STT-RAM Last-Level-Cache
Reference
Fazal Hameed, Asif Ali Khan, Jeronimo Castrillon, "Performance and Energy Efficient Design of STT-RAM Last-Level-Cache" , In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 26, no. 6, pp. 1059–1072, Jun 2018. [doi]
Abstract
Recent research has proposed having a die-stacked last-level cache (LLC) to overcome the memory wall. Lately, spin-transfer-torque random access memory (STT-RAM) caches have received attention, since they provide improved energy efficiency compared with DRAM caches. However, recently proposed STT-RAM cache architectures unnecessarily dissipate energy by fetching unneeded cache lines (CLs) into the row buffer (RB). In this paper, we propose a selective read policy for the STT-RAM which fetches those CLs into the RB that are likely to be reused. In addition, we propose a tags-update policy that reduces the number of STT-RAM writebacks. This reduces the number of reads/writes and thereby decreases the energy consumption. To reduce the latency penalty of our selective read policy, we propose the following performance optimizations: 1) an RB tags-bypass policy that reduces STT-RAM access latency; 2) an LLC data cache that stores the CLs that are likely to be used in the near future; 3) an address organization scheme that simultaneously reduces LLC access latency and miss rate; and 4) a tags-to-column mapping policy that improves access parallelism. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an eight-core system. We compare our approach with a recently proposed STT-RAM LLC with subarray parallelism support and show that our synergistic policies reduce the average LLC dynamic energy consumption by 75% and improve the system performance by 6.5%. Compared with the state-of-the-art DRAM LLC with subarray parallelism, our architecture reduces the LLC dynamic energy consumption by 82% and improves system performance by 6.8%.
Bibtex
@Article{hameed_tvlsi18,
author = {Fazal Hameed and Asif Ali Khan and Jeronimo Castrillon},
title = {Performance and Energy Efficient Design of STT-RAM Last-Level-Cache},
journal = {IEEE Transactions on Very Large Scale Integration Systems (TVLSI)},
year = {2018},
volume = {26},
number = {6},
pages = {1059--1072},
month = jun,
abstract = {Recent research has proposed having a die-stacked last-level cache (LLC) to overcome the memory wall. Lately, spin-transfer-torque random access memory (STT-RAM) caches have received attention, since they provide improved energy efficiency compared with DRAM caches. However, recently proposed STT-RAM cache architectures unnecessarily dissipate energy by fetching unneeded cache lines (CLs) into the row buffer (RB). In this paper, we propose a selective read policy for the STT-RAM which fetches those CLs into the RB that are likely to be reused. In addition, we propose a tags-update policy that reduces the number of STT-RAM writebacks. This reduces the number of reads/writes and thereby decreases the energy consumption. To reduce the latency penalty of our selective read policy, we propose the following performance optimizations: 1) an RB tags-bypass policy that reduces STT-RAM access latency; 2) an LLC data cache that stores the CLs that are likely to be used in the near future; 3) an address organization scheme that simultaneously reduces LLC access latency and miss rate; and 4) a tags-to-column mapping policy that improves access parallelism. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an eight-core system. We compare our approach with a recently proposed STT-RAM LLC with subarray parallelism support and show that our synergistic policies reduce the average LLC dynamic energy consumption by 75\% and improve the system performance by 6.5\%. Compared with the state-of-the-art DRAM LLC with subarray parallelism, our architecture reduces the LLC dynamic energy consumption by 82\% and improves system performance by 6.8\%.},
doi = {10.1109/TVLSI.2018.2804938},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1803_Hameed_TVLSI.pdf:PDF},
issn = {1063-8210},
numpages = {14},
url = {http://ieeexplore.ieee.org/document/8307465/}
}
Downloads
1803_Hameed_TVLSI [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Heterogeneous Post-CMOS Technologies Meet Software" , In Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk), Jun 2018. [Bibtex & Downloads]
Heterogeneous Post-CMOS Technologies Meet Software
Reference
Jeronimo Castrillon, "Heterogeneous Post-CMOS Technologies Meet Software" , In Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk), Jun 2018.
Bibtex
@Misc{castrillon2018ISC,
author = {Castrillon, Jeronimo},
title = {Heterogeneous Post-CMOS Technologies Meet Software},
howpublished = {Post Moore Interconnects Workshop, ISC High Performance 2018 (invited talk)},
month = jun,
year = {2018},
keywords = {invitedtalk},
location = {Frankfurt, Germany},
url = {https://beyondcmos.ornl.gov/2018/agenda.html}
}
Downloads
180628_castrillon_isc-postmoore_send [PDF]
Permalink
- Jeronimo Castrillon, "Parallel programming: Current and future systems" , In 50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM 18) (invited talk), May 2018. [Bibtex & Downloads]
Parallel programming: Current and future systems
Reference
Jeronimo Castrillon, "Parallel programming: Current and future systems" , In 50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM 18) (invited talk), May 2018.
Bibtex
@Misc{castrillon2018UdeA,
author = {Castrillon, Jeronimo},
title = {Parallel programming: Current and future systems},
howpublished = {50-year Celebration: Department of Electronics, Universidad de Antioquia, in the context of the IEEE Colombian Conference on Communications and Computing (COLCOM'18) (invited talk)},
month = may,
year = {2018},
location = {Universidad de Antioquia, Medell{\'i}n, Colombia}
}
Downloads
180516_castrillon_50years_EE_UdeA [PDF]
Permalink
- Sven Karol, Tobias Nett, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Domain-Specific Language and Editor for Parallel Particle Methods" , In ACM Transactions on Mathematical Software (TOMS), ACM, vol. 44, no. 3, pp. 32, New York, NY, USA, Mar 2018. [doi] [Bibtex & Downloads]
A Domain-Specific Language and Editor for Parallel Particle Methods
Reference
Sven Karol, Tobias Nett, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Domain-Specific Language and Editor for Parallel Particle Methods" , In ACM Transactions on Mathematical Software (TOMS), ACM, vol. 44, no. 3, pp. 32, New York, NY, USA, Mar 2018. [doi]
Abstract
Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing DSLs is not easy, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL comfortable for programmers. Some of these features —e.g., syntax highlighting, type inference, error reporting— are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the Meta Programming System (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language, a Fortran-based DSL that uses conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer’s experience is improved using static analyses and projectional editing, i.e., code-structure editing, constrained by syntax, as opposed to free-text editing. We present an explicit domain model for particle abstractions and the first formal type system for partircle methods.
Bibtex
@Article{karol_toms18,
author = {Karol, Sven and Nett, Tobias and Castrillon, Jeronimo and Sbalzarini, Ivo F.},
title = {A Domain-Specific Language and Editor for Parallel Particle Methods},
journal = {ACM Transactions on Mathematical Software (TOMS)},
issue_date = {March 2018},
volume = {44},
number = {3},
month = mar,
year = {2018},
issn = {0098-3500},
pages = {34:1--34:32},
articleno = {34},
numpages = {32},
url = {http://doi.acm.org/10.1145/3175659},
doi = {10.1145/3175659},
acmid = {3175659},
publisher = {ACM},
address = {New York, NY, USA},
pages = {32},
abstract = {
Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing DSLs is not easy, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL comfortable for programmers. Some of these features --e.g., syntax highlighting, type inference, error reporting-- are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the Meta Programming System (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language, a Fortran-based DSL that uses conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer’s experience is improved using static analyses and projectional editing, i.e., code-structure editing, constrained by syntax, as opposed to free-text editing. We present an explicit domain model for particle abstractions and the first formal type system for partircle methods.},
}
Downloads
1709_Karol_TOMS-arxiv [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Fazal Hameed, Jeronimo Castrillon, "STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement" , Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018), Mar 2018. [Bibtex & Downloads]
STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement
Reference
Fazal Hameed, Jeronimo Castrillon, "STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement" , Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018), Mar 2018.
Bibtex
@InProceedings{hameed_nvmw18,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {STT-RAM Aware Last-Level-Cache Policies for Simultaneous Energy and Performance Improvement},
booktitle = {Proceedings of the 9th Annual Non-Volatile Memories Workshop (NVMW 2018)},
year = {2018},
month = mar,
location = {San Diego, CA, USA},
numpages = {2}
}
Downloads
1803_Hameed_NVMW [PDF]
Related Paths
Permalink
- Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O" , Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
Compiling for Concise Code and Efficient I/O
Reference
Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon, "Compiling for Concise Code and Efficient I/O" , Proceedings of the 27th International Conference on Compiler Construction (CC 2018), ACM, pp. 104–115, New York, NY, USA, Feb 2018. [doi]
Abstract
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.
Bibtex
@InProceedings{ertel_cc18,
author = {Sebastian Ertel and Andr\'{e}s Goens and Justus Adam and Jeronimo Castrillon},
title = {Compiling for Concise Code and Efficient I/O},
booktitle = {Proceedings of the 27th International Conference on Compiler Construction (CC 2018)},
series = {CC 2018},
year = {2018},
month = feb,
location = {Vienna, Austria},
publisher = {ACM},
numpages = {12},
pages = {104--115},
doi = {10.1145/3178372.3179505},
url = {https://dl.acm.org/citation.cfm?id=3179505},
acmid = {3179505},
address = {New York, NY, USA},
abstract = {Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.},
}
Downloads
cc-2018-slides [PDF]
1802_Ertel_CC [PDF]
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Supporting Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM), ACM, pp. 41–50, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
Supporting Fine-grained Dataflow Parallelism in Big Data Systems
Reference
Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Supporting Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM), ACM, pp. 41–50, New York, NY, USA, Feb 2018. [doi]
Abstract
Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.
Bibtex
@InProceedings{ertel_pmam18,
author = {Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {Supporting Fine-grained Dataflow Parallelism in Big Data Systems},
booktitle = {Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM)},
year = {2018},
series = {PMAM'18},
address = {New York, NY, USA},
month = feb,
publisher = {ACM},
doi = {10.1145/3178442.3178447},
isbn = {978-1-4503-5645-9},
location = {Vienna, Austria},
pages = {41--50},
numpages = {10},
acmid = {3178447},
url = {http://doi.acm.org/10.1145/3178442.3178447},
abstract = {Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.},
}
Downloads
pmam-2018-slides [PDF]
1802_Ertel_PMAM [PDF]
Related Paths
Permalink
- Norman A. Rink, Immo Huismann, Adilla Susungi, Jeronimo Castrillon, Jörg Stiller, Jochen Fröhlich, Claude Tadonki, "CFDlang: High-level code generation for high-order methods in fluid dynamics" , Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018), ACM, pp. 5:1–5:10, New York, NY, USA, Feb 2018. [doi] [Bibtex & Downloads]
CFDlang: High-level code generation for high-order methods in fluid dynamics
Reference
Norman A. Rink, Immo Huismann, Adilla Susungi, Jeronimo Castrillon, Jörg Stiller, Jochen Fröhlich, Claude Tadonki, "CFDlang: High-level code generation for high-order methods in fluid dynamics" , Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018), ACM, pp. 5:1–5:10, New York, NY, USA, Feb 2018. [doi]
Abstract
Numerical simulations continue to enable fast and enormous progress in science and engineering. Writing efficient numerical codes is a difficult challenge that encompasses a variety of tasks from designing the right algorithms to exploiting the full potential of a platform s architecture. Domain-specific languages (DSLs) can ease these tasks by offering the right abstractions for expressing numerical problems. With the aid of domain knowledge, efficient code can then be generated automatically from abstract expressions. In this work, we present the CFDlang DSL for expressing tensor operations that constitute the performance-critical code sections in a class of real numerical applications from fluid dynamics. We demonstrate that CFDlang can be used to generate code automatically that performs as well, if not better, than carefully hand-optimized code.
Bibtex
@InProceedings{rink_rwdsl18,
author = {Norman A. Rink and Immo Huismann and Adilla Susungi and Jeronimo Castrillon and J{\"o}rg Stiller and Jochen Fr{\"o}hlich and Claude Tadonki},
title = {CFDlang: High-level code generation for high-order methods in fluid dynamics},
booktitle = {Proceedings of the 3rd International Workshop on Real World Domain Specific Languages (RWDSL 2018)},
year = {2018},
series = {RWDSL2018},
pages = {5:1--5:10},
address = {New York, NY, USA},
month = feb,
publisher = {ACM},
abstract = {Numerical simulations continue to enable fast and enormous progress in science and engineering. Writing efficient numerical codes is a difficult challenge that encompasses a variety of tasks from designing the right algorithms to exploiting the full potential of a platform's architecture. Domain-specific languages (DSLs) can ease these tasks by offering the right abstractions for expressing numerical problems. With the aid of domain knowledge, efficient code can then be generated automatically from abstract expressions. In this work, we present the CFDlang DSL for expressing tensor operations that constitute the performance-critical code sections in a class of real numerical applications from fluid dynamics. We demonstrate that CFDlang can be used to generate code automatically that performs as well, if not better, than carefully hand-optimized code.},
acmid = {3183900},
articleno = {5},
doi = {10.1145/3183895.3183900},
isbn = {978-1-4503-6355-6},
location = {Vienna, Austria},
numpages = {10},
url = {http://doi.acm.org/10.1145/3183895.3183900}
}
Downloads
1802_Rink_RWDSL [PDF]
Related Paths
Permalink
- Hermann Härtig, Nils Asmussen, Jeronimo Castrillon, Adam Lackorzynski, Michael Roitzsch, Carsten Weinhold, Akash Kumar, "Extremely Heterogeneous Systems – Not Just For Niches" , In Proceeding: Extreme Heterogeneity Workshop, Feb 2018. [Bibtex & Downloads]
Extremely Heterogeneous Systems – Not Just For Niches
Reference
Hermann Härtig, Nils Asmussen, Jeronimo Castrillon, Adam Lackorzynski, Michael Roitzsch, Carsten Weinhold, Akash Kumar, "Extremely Heterogeneous Systems – Not Just For Niches" , In Proceeding: Extreme Heterogeneity Workshop, Feb 2018.
Bibtex
@InProceedings{haertig_ehw18,
author = {Hermann H{\"a}rtig and Nils Asmussen and Jeronimo Castrillon and Adam Lackorzynski and Michael Roitzsch and Carsten Weinhold and Akash Kumar},
title = {Extremely Heterogeneous Systems -- Not Just For Niches},
booktitle = {Extreme Heterogeneity Workshop},
year = {2018},
month = feb,
note = {(Workshop took place over remote conferencing)},
location = {Gaithersburg, MD, USA}
}
Downloads
1802_Haertig_EHW [PDF]
Related Paths
Permalink
- Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap" , Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi] [Bibtex & Downloads]
Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap
Reference
Robert Khasanov, Andrés Goens, Jeronimo Castrillon, "Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap" , Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 20–25, New York, NY, USA, Jan 2018. [doi]
Abstract
Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation. Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.
Bibtex
@InProceedings{khasanov_parma18,
author = {Robert Khasanov and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap},
booktitle = {Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'18), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
series = {PARMA-DITAM '18},
isbn = {978-1-4503-6444-7},
pages = {20--25},
year = {2018},
month = jan,
numpages = {6},
url = {http://doi.acm.org/10.1145/3183767.3183790},
doi = {10.1145/3183767.3183790},
acmid = {3183790},
publisher = {ACM},
address = {New York, NY, USA},
location = {Manchester, United Kingdom},
abstract = {Modern embedded systems are rapidly increasing their complexity, both in terms of numbers of cores, as well as heterogeneity. To generate efficient code for these systems, it is common to leverage formal models of computation.
Among these, the dataflow model of Kahn Process Networks (KPN) is widespread because it is expressive but guarantees a deterministic execution. However, the KPN model is ill-suited to expose data-level parallelism, since this has to be made explicit in the process network. This is aggravated by the fact that its most common execution model, Kahn-MacQueen, poses restrictive conditions on the scheduling of data-parallel processes, leading to an inefficient execution. In this paper we present a novel extension to the KPN model and a relaxed execution strategy that addresses this problem, while keeping the deterministic KPN semantics. It improves run-time adaptivity in malleable way and provides implicit parallelism. We evaluate our approach on two architectures, improving the performance of a benchmark by up to 25.6% on an Intel chip with hyper-threading, and by up to 78.0% on a heterogeneous embedded ARM big.LITTLE architecture.},
}
Downloads
1801_Khasanov_PARMA-DITAM [PDF]
Related Paths
Permalink
- Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers" , Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG 2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018. [Bibtex & Downloads]
Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers
Reference
Andrés Goens, Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers" , Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG 2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Jan 2018.
Bibtex
@InProceedings{goens_multiprog18,
author = {Andr{\'e}s Goens and Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {Level Graphs: Generating Benchmarks for Concurrency Optimizations in Compilers},
booktitle = {Proceedings of the 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG'2018), co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
year = {2018},
url = {http://research.ac.upc.edu/multiprog/multiprog2018/papers/MULTIPROG-2018_Goens.pdf},
month = jan,
location = {Manchester, United Kingdom}
}
Downloads
1801_Goens_MULTIRPOG [PDF]
Related Paths
Permalink
- Asif Ali Khan, Fazal Hameed, Jeronimo Castrillon, "NVMain Extension for Multi-Level Cache Systems" , Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 7:1–7:6, New York, NY, USA, Jan 2018. [doi] [Bibtex & Downloads]
NVMain Extension for Multi-Level Cache Systems
Reference
Asif Ali Khan, Fazal Hameed, Jeronimo Castrillon, "NVMain Extension for Multi-Level Cache Systems" , Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), ACM, pp. 7:1–7:6, New York, NY, USA, Jan 2018. [doi]
Bibtex
@InProceedings{khan_rapido18,
author = {Asif Ali Khan and Fazal Hameed and Jeronimo Castrillon},
title = {NVMain Extension for Multi-Level Cache Systems},
booktitle = {Proceedings of the 10th RAPIDO Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, co-located with 13th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)},
series = {RAPIDO '18},
year = {2018},
month = jan,
pages = {7:1--7:6},
articleno = {7},
numpages = {6},
url = {http://doi.acm.org/10.1145/3180665.3180672},
doi = {10.1145/3180665.3180672},
acmid = {3180672},
publisher = {ACM},
address = {New York, NY, USA},
location = {Manchester, United Kingdom},
isbn = {978-1-4503-6417-1},
}
Downloads
1801_Khan_RAPIDO [PDF]
Related Paths
Permalink
2017
- Fazal Hameed, Christian Menard, Jeronimo Castrillon, "Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache" , Proceedings of the International Symposium on Memory Systems (MemSys 17), ACM, pp. 141–151, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache
Reference
Fazal Hameed, Christian Menard, Jeronimo Castrillon, "Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache" , Proceedings of the International Symposium on Memory Systems (MemSys 17), ACM, pp. 141–151, New York, NY, USA, Oct 2017. [doi]
Bibtex
@InProceedings{hameed_memsys17,
author = {Fazal Hameed and Christian Menard and Jeronimo Castrillon},
title = {Efficient STT-RAM Last-Level-Cache Architecture to replace DRAM Cache},
booktitle = {Proceedings of the International Symposium on Memory Systems (MemSys'17)},
series = {MEMSYS '17},
year = {2017},
month = oct,
isbn = {978-1-4503-5335-9},
location = {Alexandria, Virginia},
pages = {141--151},
numpages = {11},
url = {http://doi.acm.org/10.1145/3132402.3132414},
doi = {10.1145/3132402.3132414},
acmid = {3132414},
publisher = {ACM},
address = {New York, NY, USA},
}
Downloads
1710_Hameed_Memsys [PDF]
Related Paths
Permalink
- Miguel Angel Aguilar, Abhishek Aggarwal, Awaid Shaheen, Rainer Leupers, Gerd Ascheid, Jeronimo Castrillon, Liam Fitzpatrick, "Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress" , Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), ACM, pp. 14:1–14:2, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress
Reference
Miguel Angel Aguilar, Abhishek Aggarwal, Awaid Shaheen, Rainer Leupers, Gerd Ascheid, Jeronimo Castrillon, Liam Fitzpatrick, "Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress" , Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), ACM, pp. 14:1–14:2, New York, NY, USA, Oct 2017. [doi]
Bibtex
@inproceedings{aguilar_cases17,
author = {Aguilar, Miguel Angel and Aggarwal, Abhishek and Shaheen, Awaid and Leupers, Rainer and Ascheid, Gerd and Castrillon, Jeronimo and Fitzpatrick, Liam},
title = {Multi-grained Performance Estimation for MPSoC Compilers: Work-in-progress},
booktitle = {Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)},
series = {CASES '17},
year = {2017},
month = oct,
isbn = {978-1-4503-5184-3},
location = {Seoul, Republic of Korea},
pages = {14:1--14:2},
articleno = {14},
numpages = {2},
url = {http://doi.acm.org/10.1145/3125501.3125521},
doi = {10.1145/3125501.3125521},
acmid = {3125521},
publisher = {ACM},
address = {New York, NY, USA},
}
Downloads
1710_Aguilar_CASES [PDF]
Permalink
- Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, Jochen Fröhlich, "Towards Compositional and Generative Tensor Optimizations" , Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 17), ACM, pp. 169–175, New York, NY, USA, Oct 2017. [doi] [Bibtex & Downloads]
Towards Compositional and Generative Tensor Optimizations
Reference
Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, Jochen Fröhlich, "Towards Compositional and Generative Tensor Optimizations" , Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 17), ACM, pp. 169–175, New York, NY, USA, Oct 2017. [doi]
Bibtex
@InProceedings{rink_gpce17,
author = {Adilla Susungi and Norman A. Rink and Jeronimo Castrillon and Immo Huismann and Albert Cohen and Claude Tadonki and J{\"o}rg Stiller and Jochen Fr{\"o}hlich},
title = {Towards Compositional and Generative Tensor Optimizations},
booktitle = {Proceedings of 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'17)},
series = {GPCE 2017},
year = {2017},
pages = {169--175},
month = oct,
isbn = {978-1-4503-5524-7},
location = {Vancouver, BC, Canada},
pages = {169--175},
numpages = {7},
url = {http://doi.acm.org/10.1145/3136040.3136050},
doi = {10.1145/3136040.3136050},
acmid = {3136050},
publisher = {ACM},
address = {New York, NY, USA},
}
Downloads
1710_Rink_GPCE [PDF]
Related Paths
Permalink
- Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017) (Lawrence Rauchwerger) , Springer, Cham, pp. 281–282, Oct 2017. [doi] [Bibtex & Downloads]
POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems
Reference
Sebastian Ertel, Justus Adam, Jeronimo Castrillon, "POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems" , Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017) (Lawrence Rauchwerger) , Springer, Cham, pp. 281–282, Oct 2017. [doi]
Bibtex
@InProceedings{ertel_lcpc17,
author = {Sebastian Ertel and Justus Adam and Jeronimo Castrillon},
title = {POSTER: Towards Fine-grained Dataflow Parallelism in Big Data Systems},
booktitle = {Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017)},
year = {2017},
editor = {Lawrence Rauchwerger},
publisher = {Springer, Cham},
location = {Texas A{\&}M University, College Station, Texas},
month = oct,
isbn = {978-3-030-35224-0},
pages = {281--282},
doi = {10.1007/978-3-030-35225-7},
url = {https://link.springer.com/book/10.1007%2F978-3-030-35225-7},
}
Downloads
1710_Ertel_LCPC [PDF]
Related Paths
Permalink
- Sven Karol, Tobias Nett, Pietro Incardona, Nesrine Khouzami, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Language and Development Environment for Parallel Particle Methods" , Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017 (P. Wriggers and M. Bischoff and E. Oñate and D.R.J. Owen and T. Zohdi) , Sep 2017. [Bibtex & Downloads]
A Language and Development Environment for Parallel Particle Methods
Reference
Sven Karol, Tobias Nett, Pietro Incardona, Nesrine Khouzami, Jeronimo Castrillon, Ivo F. Sbalzarini, "A Language and Development Environment for Parallel Particle Methods" , Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017 (P. Wriggers and M. Bischoff and E. Oñate and D.R.J. Owen and T. Zohdi) , Sep 2017.
Bibtex
@InProceedings{karol_particles17,
author = {Sven Karol and Tobias Nett and Pietro Incardona and Nesrine Khouzami and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {A Language and Development Environment for Parallel Particle Methods},
booktitle = {Proceedings of the 5th International Conference on Particle-based Methods. Fundamentals and Applications PARTICLES 2017},
year = {2017},
editor = {P. Wriggers and M. Bischoff and E. O{\~n}ate and D.R.J. Owen and T. Zohdi},
url = {https://www.semanticscholar.org/paper/A-Language-and-Development-Environment-for-Paralle-Karol-Nett/2b79bd3836aeb8e2fb2a2b5d9949f9efb1bdfab7?tab=abstract},
month = sep,
}
Downloads
1709_Karol_particles [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Jeronimo Castrillon, Tei-Wei Kuo, Heike E. Riel, Matthias Lieber, "Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)" , In Dagstuhl Reports (Jerónimo Castrillón-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber) , Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, vol. 7, no. 2, pp. 1–22, Dagstuhl, Germany, Aug 2017. [doi] [Bibtex & Downloads]
Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)
Reference
Jeronimo Castrillon, Tei-Wei Kuo, Heike E. Riel, Matthias Lieber, "Wildly Heterogeneous Post-CMOS Technologies Meet Software (Dagstuhl Seminar 17061)" , In Dagstuhl Reports (Jerónimo Castrillón-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber) , Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, vol. 7, no. 2, pp. 1–22, Dagstuhl, Germany, Aug 2017. [doi]
Bibtex
@Article{castrillnmazo_et_al:DR:2017:7349,
author = {Jeronimo Castrillon and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber},
title = ,
journal = {Dagstuhl Reports},
year = {2017},
volume = {7},
number = {2},
month = aug,
pages = {1--22},
address = {Dagstuhl, Germany},
annote = {Keywords: 3D integration, compilers, emerging post-CMOS circuit materials and technologies, hardware/software co-design, heterogeneous hardware, nanoelectronics},
doi = {10.4230/DagRep.7.2.1},
editor = {Jer{\'o}nimo Castrill{\'o}n-Mazo and Tei-Wei Kuo and Heike E. Riel and Matthias Lieber},
issn = {2192-5283},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
url = {http://drops.dagstuhl.de/opus/volltexte/2017/7349},
urn = {urn:nbn:de:0030-drops-73499}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis" , In ACM Transactions on Architecture and Code Optimization (TACO), ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi] [Bibtex & Downloads]
Symmetry in Software Synthesis
Reference
Andrés Goens, Sergio Siccha, Jeronimo Castrillon, "Symmetry in Software Synthesis" , In ACM Transactions on Architecture and Code Optimization (TACO), ACM, vol. 14, no. 2, pp. 20:1–20:26, New York, NY, USA, Jul 2017. [doi]
Abstract
With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.
Bibtex
@article{goens_taco17symmetry,
author = {Goens, Andr{\'e}s and Siccha, Sergio and Castrillon, Jeronimo},
title = {Symmetry in Software Synthesis},
journal = {ACM Transactions on Architecture and Code Optimization (TACO),},
issue_date = {July 2017},
volume = {14},
number = {2},
month = jul,
year = {2017},
issn = {1544-3566},
pages = {20:1--20:26},
articleno = {20},
numpages = {26},
url = {http://doi.acm.org/10.1145/3095747},
doi = {10.1145/3095747},
acmid = {3095747},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Scalability, automation, clusters, design-space exploration, group theory, heterogeneous, inverse-semigroups, mapping, metaheuristics, network-on-chip, symmetry},
eprint = "arXiv:1704.06623",
abstract = {With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this paper, we present a formal framework that can determine the inherent symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-size benchmarks, our approach consistently yields reductions of the overall execution time of algorithms, accelerating them by a factor up to 10 in our experiments, or improving the quality of the results.}
}
Downloads
1704_Goens_TACO-arxiv [PDF]
Related Paths
Permalink
- Christian Menard, Matthias Jung, Jeronimo Castrillon, Norbert Wehn, "System Simulation with gem5 and SystemC: The Keystone for Full Interoperability" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), pp. 62–69, Jul 2017. [doi] [Bibtex & Downloads]
System Simulation with gem5 and SystemC: The Keystone for Full Interoperability
Reference
Christian Menard, Matthias Jung, Jeronimo Castrillon, Norbert Wehn, "System Simulation with gem5 and SystemC: The Keystone for Full Interoperability" , Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), pp. 62–69, Jul 2017. [doi]
Abstract
SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of accurate, free, changeable and realistic SystemC models of modern CPUs. Therefore, many researchers use the cycle accurate open source system simulator gem5, which has been developed in parallel to the SystemC standard. In this paper we present a coupling of gem5 with SystemC that offers full interoperability between both simulation frameworks, and therefore enables a huge set of possibilities for system level design space exploration. Furthermore, we show that the coupling itself only induces a relatively small overhead to the total execution time of the simulation.
Bibtex
@InProceedings{menard_samos17,
author = {Christian Menard and Matthias Jung and Jeronimo Castrillon and Norbert Wehn},
title = {System Simulation with gem5 and SystemC: The Keystone for Full Interoperability},
booktitle = {Proceedings of the IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS)},
year = {2017},
month = jul,
location = {Pythagorion, Greece},
pages = {62--69},
organization = {IEEE},
doi = {10.1109/SAMOS.2017.8344612},
url = {https://ieeexplore.ieee.org/document/8344612/},
isbn = {978-1-5386-3437-0},
abstract = {SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of accurate, free, changeable and realistic SystemC models of modern CPUs. Therefore, many researchers use the cycle accurate open source system simulator gem5, which has been developed in parallel to the SystemC standard. In this paper we present a coupling of gem5 with SystemC that offers full interoperability between both simulation frameworks, and therefore enables a huge set of possibilities for system level design space exploration. Furthermore, we show that the coupling itself only induces a relatively small overhead to the total execution time of the simulation.},
}
Downloads
1707_Menard_SAMOS [PDF]
Related Paths
Permalink
- Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES 17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]
TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings
Reference
Andrés Goens, Robert Khasanov, Marcus Hähnel, Till Smejkal, Hermann Härtig, Jeronimo Castrillon, "TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES 17), ACM, pp. 11–20, New York, NY, USA, Jun 2017. [doi]
Bibtex
@InProceedings{goens_scopes17,
author = {Andr\'{e}s Goens and Robert Khasanov and Marcus H{\"a}hnel and Till Smejkal and Hermann H{\"a}rtig and Jeronimo Castrillon},
title = {TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17)},
year = {2017},
month = jun,
series = {SCOPES '17},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
pages = {11--20},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078663},
doi = {10.1145/3078659.3078663},
acmid = {3078663},
publisher = {ACM},
address = {New York, NY, USA}
}
Downloads
1706_Goens_SCOPES [PDF]
Related Paths
Permalink
- Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES 17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi] [Bibtex & Downloads]
Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering
Reference
Gerald Hempel, Andrés Goens, Josefine Asmus, Jeronimo Castrillon, Ivo F. Sbalzarini, "Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering" , Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES 17), ACM, pp. 21–30, New York, NY, USA, Jun 2017. [doi]
Bibtex
@InProceedings{hempel_scopes17,
author = {Gerald Hempel and Andr\'{e}s Goens and Josefine Asmus and Jeronimo Castrillon and Ivo F. Sbalzarini},
title = {Robust Mapping of Process Networks to Many-Core Systems Using Bio-Inspired Design Centering},
booktitle = {Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES '17)},
year = {2017},
series = {SCOPES '17},
pages = {21--30},
address = {New York, NY, USA},
month = jun,
publisher = {ACM},
acmid = {3078667},
doi = {10.1145/3078659.3078667},
isbn = {978-1-4503-5039-6},
location = {Sankt Goar, Germany},
numpages = {10},
url = {http://doi.acm.org/10.1145/3078659.3078667}
}
Downloads
1706_Hempel_SCOPES [PDF]
Related Paths
Biological Systems Path, Orchestration Path
Permalink
- Johanna Sepúlveda, Vania Marangozova-Martin, Jeronimo Castrillon, "Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface" , Elsevier, Jun 2017. [doi] [Bibtex & Downloads]
Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface
Reference
Johanna Sepúlveda, Vania Marangozova-Martin, Jeronimo Castrillon, "Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface" , Elsevier, Jun 2017. [doi]
Bibtex
@Article{sepulveda_alchemy17_preface,
author = {Sep{\'u}lveda, Johanna and Marangozova-Martin, Vania and Castrillon, Jeronimo},
title = {Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface},
year = {2017},
month = jun,
doi = {10.1016/j.procs.2017.05.276},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1706_sepulveda_alchemy.pdf:PDF},
url = {http://www.sciencedirect.com/science/article/pii/S1877050917309286},
publisher = {Elsevier}
}
Downloads
No Downloads available for this publication
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Extending a Compiler Backend for Complete Memory Error Detection" , In Proceeding: Lecture Notes in Informatics: Automotive - Safety & Security 2017 (Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Plödereder) , pp. 61–74, May 2017. (Best paper award) [Bibtex & Downloads]
Extending a Compiler Backend for Complete Memory Error Detection
Reference
Norman A. Rink, Jeronimo Castrillon, "Extending a Compiler Backend for Complete Memory Error Detection" , In Proceeding: Lecture Notes in Informatics: Automotive - Safety & Security 2017 (Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Plödereder) , pp. 61–74, May 2017. (Best paper award)
Abstract
Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to faults. Applications can be protected against errors resulting from faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to intermediate program representations are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, the compiler backend may introduce additional memory accesses. This report presents an extended compiler backend that protects these accesses against faults in the memory system. It is demonstrated that this enables the detection of all single bit flips in memory. On a subset of SPEC CINT2006 the runtime overhead caused by the extended backend amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86 64.
Bibtex
@InProceedings{rink_automotive17,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Extending a Compiler Backend for Complete Memory Error Detection},
booktitle = {Lecture Notes in Informatics: Automotive - Safety \& Security 2017},
editor = {Peter Dencker and Herbert Klenk and Hubert Kelle and Erhard Pl{\"o}dereder},
year = {2017},
pages = {61--74},
month = may,
abstract = {Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to faults. Applications can be protected against errors resulting from faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to intermediate program representations are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, the compiler backend may introduce additional memory accesses. This report presents an extended compiler backend that protects these accesses against faults in the memory system. It is demonstrated that this enables the detection of all single bit flips in memory. On a subset of SPEC CINT2006 the runtime overhead caused by the extended backend amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86 64.},
file = {:/Users/jeronimocastrillon/Documents/Academic/mypapers/1705_rink_automotive.pdf:PDF},
isbn = {978-3-88579-663-3},
issn = {1617-5468},
url = {https://dl.gi.de/bitstream/handle/20.500.12116/147/paper04.pdf?sequence=1&isAllowed=y},
}
Downloads
1705_rink_automotive [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Markus Haehnel, Frehiwot Melak Arega, Waltenegus Dargie, Robert Khasanov, Jeronimo Castrillon, "Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures" , Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf 17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), vol. , no. , pp. 432-437, May 2017. [doi] [Bibtex & Downloads]
Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures
Reference
Markus Haehnel, Frehiwot Melak Arega, Waltenegus Dargie, Robert Khasanov, Jeronimo Castrillon, "Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures" , Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf 17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), vol. , no. , pp. 432-437, May 2017. [doi]
Bibtex
@InProceedings{khasanov_dcperf17,
author = {Markus Haehnel and Frehiwot Melak Arega and Waltenegus Dargie and Robert Khasanov and Jeronimo Castrillon},
title = {Application Interference Analysis: Towards Energy-efficient Workload Management on Heterogeneous Micro-Server Architectures},
booktitle = {Proceedings of the 7th International Workshop on Big Data in Cloud Performance (DCPerf'17), IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)},
year = {2017},
month = may,
volume={},
number={},
pages={432-437},
doi={10.1109/INFCOMW.2017.8116415},
ISSN={},
url = {http://ieeexplore.ieee.org/document/8116415/},
location = {Atlanta, USA}
}
Downloads
1705_Khasanov_DCPerf [PDF]
Related Paths
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Trading Fault Tolerance for Performance in AN Encoding" , Proceedings of the ACM International Conference on Computing Frontiers (CF 17), ACM, pp. 183–190, New York, NY, USA, May 2017. [doi] [Bibtex & Downloads]
Trading Fault Tolerance for Performance in AN Encoding
Reference
Norman A. Rink, Jeronimo Castrillon, "Trading Fault Tolerance for Performance in AN Encoding" , Proceedings of the ACM International Conference on Computing Frontiers (CF 17), ACM, pp. 183–190, New York, NY, USA, May 2017. [doi]
Bibtex
@InProceedings{rink_cf17,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Trading Fault Tolerance for Performance in {AN} Encoding},
booktitle = {Proceedings of the ACM International Conference on Computing Frontiers (CF'17)},
year = {2017},
isbn = {978-1-4503-4487-6},
location = {Siena, Italy},
pages = {183--190},
numpages = {8},
url = {http://doi.acm.org/10.1145/3075564.3075565},
doi = {10.1145/3075564.3075565},
acmid = {3075565},
publisher = {ACM},
address = {New York, NY, USA},
month = may,
}
Downloads
1705_Rink_cf [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Rainer Leupers, Miguel Angel Aguilar, Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng, "MAPS: A Software Development Environment for Embedded Multicore Applications" , Springer Netherlands, pp. 1–33, Dordrecht, Apr 2017. [doi] [Bibtex & Downloads]
MAPS: A Software Development Environment for Embedded Multicore Applications
Reference
Rainer Leupers, Miguel Angel Aguilar, Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng, "MAPS: A Software Development Environment for Embedded Multicore Applications" , Springer Netherlands, pp. 1–33, Dordrecht, Apr 2017. [doi]
Abstract
The use of heterogeneous Multi-Processor System-on-Chip (MPSoC) is a widely accepted solution to address the increasing demands on high performance and energy efficiency for modern embedded devices. To enable the full potential of these platforms, new tools are needed to tackle the programming complexity of MPSoCs, while allowing for high productivity. This chapter discusses the MPSoC Application Programming Studio (MAPS), a framework that provides facilities for expressing parallelism and tool flows for parallelization, mapping/scheduling, and code generation for heterogeneous MPSoCs. Two case studies of the use of MAPS in commercial environments are presented. This chapter closes by discussing early experiences of transferring the MAPS technology into Silexica GmbH, a start-up company that provides multi-core programming tools.
Bibtex
@InBook{leupers_hhcd17,
title = {MAPS: A Software Development Environment for Embedded Multicore Applications},
author = {Rainer Leupers and Miguel Angel Aguilar and Juan Fernando Eusse and Jeronimo Castrillon and Weihua Sheng},
editor = {Soonhoi Ha and J{\"u}rgen Teich},
publisher = {Springer Netherlands},
year = {2017},
address = {Dordrecht},
month = apr,
booktitle = {Handbook of Hardware/Software Codesign},
doi = {10.1007/978-94-017-7358-4_2-1},
isbn = {978-94-017-7358-4},
url = {http://dx.doi.org/10.1007/978-94-017-7358-4_2-1},
pages = {1--33},
abstract = {The use of heterogeneous Multi-Processor System-on-Chip (MPSoC) is a widely accepted solution to address the increasing demands on high performance and energy efficiency for modern embedded devices. To enable the full potential of these platforms, new tools are needed to tackle the programming complexity of MPSoCs, while allowing for high productivity. This chapter discusses the MPSoC Application Programming Studio (MAPS), a framework that provides facilities for expressing parallelism and tool flows for parallelization, mapping/scheduling, and code generation for heterogeneous MPSoCs. Two case studies of the use of MAPS in commercial environments are presented. This chapter closes by discussing early experiences of transferring the MAPS technology into Silexica GmbH, a start-up company that provides multi-core programming tools.},
}
Downloads
No Downloads available for this publication
Permalink
- Lars Schütze, Jeronimo Castrillon, "Analyzing State-of-the-Art Role-based Programming Languages" , Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming 17), ACM, pp. 9:1–9:6, New York, NY, USA, Apr 2017. [doi] [Bibtex & Downloads]
Analyzing State-of-the-Art Role-based Programming Languages
Reference
Lars Schütze, Jeronimo Castrillon, "Analyzing State-of-the-Art Role-based Programming Languages" , Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming 17), ACM, pp. 9:1–9:6, New York, NY, USA, Apr 2017. [doi]
Bibtex
@InProceedings{schuetze_lassy17,
author = {Lars Sch{\"u}tze and Jeronimo Castrillon},
title = {Analyzing State-of-the-Art Role-based Programming Languages},
booktitle = {Proceedings of the First International Conference on the Art, Science and Engineering of Programming (Programming'17)},
series = {Programming '17},
year = {2017},
month = apr,
isbn = {978-1-4503-4836-2},
location = {Brussels, Belgium},
pages = {9:1--9:6},
articleno = {9},
numpages = {6},
url = {http://doi.acm.org/10.1145/3079368.3079386},
doi = {10.1145/3079368.3079386},
acmid = {3079386},
publisher = {ACM},
address = {New York, NY, USA},
}
Downloads
1704_Schuetze_lassy [PDF]
Permalink
- Fazal Hameed, Jeronimo Castrillon, "Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization" , Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE), EDA Consortium, pp. 362–367, Mar 2017. [doi] [Bibtex & Downloads]
Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization
Reference
Fazal Hameed, Jeronimo Castrillon, "Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization" , Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE), EDA Consortium, pp. 362–367, Mar 2017. [doi]
Abstract
State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization along with energy benefits. However, small row buffers do not fully exploit the temporal locality of tag accesses, leading to reduced Tag- Cache hit rates. As a result, the DRAM cache needs to be re-designed for small row buffer organization to achieve additional performance benefits. In this paper, we propose a novel tag-store mechanism that improves the Tag-Cache hit rate by 70% compared to existing DRAM tag-store mechanisms employing small row buffer organization. In addition, we enhance the DRAM cache controller with novel policies that take into account the locality characteristics of cache accesses. We evaluate our novel tag-store mechanism and controller policies in an 8-core system running the SPEC2006 benchmark and compare their performance and energy consumption against recent proposals. Our architecture improves the average performance by 21.2% and 11.4% respectively compared to large and small row buffer organizations via simultaneously improving both parameters. Compared to DRAM cache with large row buffer organization, we report an energy improvement of 62%.
Bibtex
@InProceedings{hameed_date17,
author = {Fazal Hameed and Jeronimo Castrillon},
title = {Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization},
booktitle = {Proceedings of the 2017 Design, Automation and Test in Europe conference (DATE)},
year = {2017},
series = {DATE '17},
pages = {362--367},
month = mar,
publisher = {EDA Consortium},
abstract = {State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization along with energy benefits. However, small row buffers do not fully exploit the temporal locality of tag accesses, leading to reduced Tag- Cache hit rates. As a result, the DRAM cache needs to be re-designed for small row buffer organization to achieve additional performance benefits. In this paper, we propose a novel tag-store mechanism that improves the Tag-Cache hit rate by 70\% compared to existing DRAM tag-store mechanisms employing small row buffer organization. In addition, we enhance the DRAM cache controller with novel policies that take into account the locality characteristics of cache accesses. We evaluate our novel tag-store mechanism and controller policies in an 8-core system running the SPEC2006 benchmark and compare their performance and energy consumption against recent proposals. Our architecture improves the average performance by 21.2\% and 11.4\% respectively compared to large and small row buffer organizations via simultaneously improving both parameters. Compared to DRAM cache with large row buffer organization, we report an energy improvement of 62\%.},
isbn = {978-3-9815370-8-6},
doi={10.23919/DATE.2017.7927017},
url = {http://ieeexplore.ieee.org/document/7927017/},
location = {Lausanne, Switzerland}
}
Downloads
1703_Hameed_DATE [PDF]
Related Paths
Permalink
- Norman A. Rink, Jeronimo Castrillon, "flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication" , Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017, pp. 15–22, Mar 2017. [Bibtex & Downloads]
flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication
Reference
Norman A. Rink, Jeronimo Castrillon, "flexMEDiC: flexible Memory Error Detection by Combined data encoding and duplication" , Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017, pp. 15–22, Mar 2017.
Abstract
Errors in memory are known to be a major cause of system failures. Moreover, it has recently been found that single-error correcting, double-error detecting (SECDED) codes, which are widely used in ECC memory modules, are incapable of handling large fractions of errors that occur in practice. This calls for more powerful error detection measures. However, the higher the number of bit flips that can still be detected as an error, the larger the memory overhead. Cost considerations and the varying needs for reliability of different applications may not always warrant laying down extra hardware to accommodate overheads. Software-implemented error detection offers a flexible alternative. In this work we propose the software-implemented flexMEDiC scheme for detecting errors in the memory system, including main memory, on-chip caches, and load-store queues. It is shown that single and double bit flips are detected by flexMEDiC, and evidence is given that suggests that up to five bit flips within a single data word can still be detected as errors. The average runtime overhead incurred by flexMEDiC is 1.55x.
Bibtex
@InProceedings{rees:2017,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {{flexMEDiC}: flexible {M}emory {E}rror {D}etection by Combined data encoding and duplication},
booktitle = {Proceedings of the 2nd International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with DATE 2017},
year = {2017},
month = mar,
pages = {15--22},
abstract = {Errors in memory are known to be a major cause of system failures. Moreover, it has recently been found that single-error correcting, double-error detecting (SECDED) codes, which are widely used in ECC memory modules, are incapable of handling large fractions of errors that occur in practice. This calls for more powerful error detection measures. However, the higher the number of bit flips that can still be detected as an error, the larger the memory overhead. Cost considerations and the varying needs for reliability of different applications may not always warrant laying down extra hardware to accommodate overheads. Software-implemented error detection offers a flexible alternative. In this work we propose the software-implemented flexMEDiC scheme for detecting errors in the memory system, including main memory, on-chip caches, and load-store queues. It is shown that single and double bit flips are detected by flexMEDiC, and evidence is given that suggests that up to five bit flips within a single data word can still be detected as errors. The average runtime overhead incurred by flexMEDiC is 1.55x.},
}
Downloads
1703_Rink_REES [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Programming for adaptive and energy-efficient computing" , In International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote), Mar 2017. [Bibtex & Downloads]
Programming for adaptive and energy-efficient computing
Reference
Jeronimo Castrillon, "Programming for adaptive and energy-efficient computing" , In International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote), Mar 2017.
Bibtex
@Misc{castrillon2017hp3c,
author = {Castrillon, Jeronimo},
title = {Programming for adaptive and energy-efficient computing},
howpublished = {International Conference on High Performance Compilation, Computing and Communications (HP3C-2017) (keynote)},
month = mar,
year = {2017},
location = {Kuala Lumpur, Malaysia}
}
Downloads
170323_castrill_hp3c [PDF]
Permalink
- Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks" , In Proceeding: ACM SRC at International Symposium on Code Generationand Optimization (CGO), Feb 2017. [Bibtex & Downloads]
Optimizing for Data-Parallelism in Kahn Process Networks
Reference
Andrés Goens, Jeronimo Castrillon, "Optimizing for Data-Parallelism in Kahn Process Networks" , In Proceeding: ACM SRC at International Symposium on Code Generationand Optimization (CGO), Feb 2017.
Bibtex
@inproceedings{goens17cgo,
author = {Andr\'{e}s Goens and Jeronimo Castrillon},
title = {Optimizing for Data-Parallelism in Kahn Process Networks},
year = {2017},
month = feb,
booktitle= {ACM SRC at International Symposium on
Code Generationand Optimization (CGO)},
location = {Austin, TX, USA},
}
Downloads
1701_Goens_SRCCGO [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "On Mapping to Multi/Manycores" , In 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017. [Bibtex & Downloads]
On Mapping to Multi/Manycores
Reference
Jeronimo Castrillon, "On Mapping to Multi/Manycores" , In 10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017.
Bibtex
@Misc{castrillon2017multiprog,
author = {Castrillon, Jeronimo},
title = {On Mapping to Multi/Manycores},
howpublished = {10th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk)},
month = jan,
year = {2017},
location = {Stockholm, Sweden}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Flexible and Scalable Dataflow Programming for Manycores" , In Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017. [Bibtex & Downloads]
Flexible and Scalable Dataflow Programming for Manycores
Reference
Jeronimo Castrillon, "Flexible and Scalable Dataflow Programming for Manycores" , In Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk), Jan 2017.
Bibtex
@Misc{castrillon2017hipeactut,
author = {Castrillon, Jeronimo},
title = {Flexible and Scalable Dataflow Programming for Manycores},
howpublished = {Tutorial for heterogeneous multicore design automation: current and future, held in conjunction with the 12th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) (invited talk)},
month = jan,
year = {2017},
location = {Stockholm, Sweden}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
2016
- Norman A. Rink, Jeronimo Castrillon, "Comprehensive Backend Support for Local Memory Fault Tolerance" , Technical report, Technische Universität Dresden, pp. 11, Dec 2016. [Bibtex & Downloads]
Comprehensive Backend Support for Local Memory Fault Tolerance
Reference
Norman A. Rink, Jeronimo Castrillon, "Comprehensive Backend Support for Local Memory Fault Tolerance" , Technical report, Technische Universität Dresden, pp. 11, Dec 2016.
Bibtex
@TechReport{rink_techrep16,
author = {Norman A. Rink and Jeronimo Castrillon},
title = {Comprehensive Backend Support for Local Memory Fault Tolerance},
institution = {Technische Universit{\"a}t Dresden},
year = {2016},
month = dec,
issn = {1430-211X},
pages = {11},
url = {https://cfaed.tu-dresden.de/files/user/nrink/tech-report-ro.pdf}
}
Downloads
tech-report-ro [PDF]
Related Paths
Permalink
- Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware" , Proceedings of the 1st International Workshop on Post-Moore s Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016. [Bibtex & Downloads]
The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware
Reference
Marcus Völp, Sascha Klüppelholz, Jeronimo Castrillon, Hermann Härtig, Nils Asmussen, Uwe Assmann, Franz Baader, Christel Baier, Gerhard Fettweis, Jochen Fröhlich, Andres Goens, Sebastian Haas, Dirk Habich, Mattis Hasler, Immo Huismann, Tomas Karnagel, Sven Karol, Wolfgang Lehner, Linda Leuschner, Matthias Lieber, Siqi Ling, Steffen Märcker, Johannes Mey, Wolfgang Nagel, Benedikt Nöthen, Rafael Peñaloza, Michael Raitza, Jörg Stiller, Annett Ungethüm, Axel Voigt, "The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware" , Proceedings of the 1st International Workshop on Post-Moore s Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, USA, Nov 2016.
Abstract
Future systems based on post-CMOS technologies will be wildly heterogeneous, with properties largely unknown today. This paper presents our design of a new hardware/software stack to address the challenge of preparing software development for such systems. It combines well-understood technologies from different areas, e.g., network-on-chips, capability operating systems, flexible programming models and model checking. We describe our approach and provide details on key technologies.
Bibtex
@InProceedings{voelp16_pmes,
author = {Marcus V{\"o}lp and Sascha Kl{\"u}ppelholz and Jeronimo Castrillon and Hermann H{\"a}rtig and Nils Asmussen and Uwe Assmann and Franz Baader and Christel Baier and Gerhard Fettweis and Jochen Fr{\"o}hlich and Andres Goens and Sebastian Haas and Dirk Habich and Mattis Hasler and Immo Huismann and Tomas Karnagel and Sven Karol and Wolfgang Lehner and Linda Leuschner and Matthias Lieber and Siqi Ling and Steffen M{\"a}rcker and Johannes Mey and Wolfgang Nagel and Benedikt N{\"o}then and Rafael Pe{\~n}aloza and Michael Raitza and J{\"o}rg Stiller and Annett Ungeth{\"u}m and Axel Voigt},
title = {The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware},
booktitle = {Proceedings of the 1st International Workshop on Post-Moore's Era Supercomputing (PMES), Co-located with The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)},
year = {2016},
address = {Salt Lake City, USA},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Voelp_PMES.pdf}
abstract = {Future systems based on post-CMOS technologies
will be wildly heterogeneous, with properties largely unknown today.
This paper presents our design of a new hardware/software stack to address the
challenge of preparing software development for such systems.
It combines well-understood technologies from different areas, e.g., network-on-chips,
capability operating systems, flexible programming models and model checking.
We describe our approach and provide details on key technologies.},
}
Downloads
1611_Voelp_PMES [PDF]
Related Paths
Permalink
- Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers" , Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS 16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi] [Bibtex & Downloads]
High-Level NoC Model for MPSoC Compilers
Reference
Christian Menard, Andrés Goens, Jeronimo Castrillon, "High-Level NoC Model for MPSoC Compilers" , Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS 16), pp. 1-6, Copenhagen, Denmark, Nov 2016. [doi]
Bibtex
@InProceedings{menard_norcas16,
author = {Christian Menard and Andr\'{e}s Goens and Jeronimo Castrillon},
title = {High-Level NoC Model for MPSoC Compilers},
booktitle = {Proceedings of the IEEE Nordic Circuits and Systems Conference (NORCAS'16)},
year = {2016},
pages={1-6},
doi = {10.1109/NORCHIP.2016.7792876},
series = {NORCAS},
address = {Copenhagen, Denmark},
month = nov,
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1611_Menard_NORCAS.pdf}
}
Downloads
1611_Menard_NORCAS [PDF]
Related Paths
Permalink
- Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study" , Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi] [Bibtex & Downloads]
Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study
Reference
Andres Goens, Robert Khasanov, Jeronimo Castrillon, Simon Polstra, Andy Pimentel, "Why Comparing System-level MPSoC Mapping Approaches is Difficult: a Case Study" , Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), pp. 281-288, Ecole Centrale de Lyon, Lyon, France, Sep 2016. [doi]
Abstract
Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.
Bibtex
@InProceedings{goen_mcsoc16,
author= {Andres Goens and Robert Khasanov and Jeronimo Castrillon and Simon Polstra and Andy Pimentel},
title= {Why Comparing System-level {MPSoC} Mapping Approaches is Difficult: a Case Study},
booktitle= {Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16)},
year= {2016},
address= {Ecole Centrale de Lyon, Lyon, France},
month= sep,
pages = {281-288},
doi = {10.1109/MCSoC.2016.48},
abstract = {Software abstractions are crucial to effectively program heterogeneous Multi-Processor Systems on Chip (MPSoCs). Prime examples of such abstractions are Kahn Process Networks (KPNs) and execution traces. When modeling computation as a KPN, one of the key challenges is to obtain a good mapping, i.e., an assignment of logical computation and communication to physical resources. In this paper we compare two system-level frameworks for solving the mapping problem: Sesame and MAPS. These frameworks, while superficially similar, embody different approaches. Sesame, motivated by modeling and design-space exploration, uses evolutionary algorithms for mapping. MAPS, being a compiler framework, uses simple and fast heuristics instead. In this work we highlight the value of common abstractions, such as KPNs and traces, as a vehicle to enable comparisons between large independent frameworks. These types of comparisons are fundamental for advancing research in the area. At the same time, we illustrate how the lack of formalized models at the hardware level are an obstacle to achieving fair comparisons. Additionally, using a set of applications from the embedded systems domain, we observe that genetic algorithms tend to outperform heuristics by a factor between 1x and 5x, with notable exceptions. This performance comes at the cost of a longer computation time, between 0 and 2 orders of magnitude in our experiments.},
days= {21},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/1609_Goens_MCSoC.pdf}
}
Downloads
1609_Goens_MCSoC [PDF]
Related Paths
Permalink
- Benjamin Schiller, Clemens Deusser, Jeronimo Castrillon, Thorsten Strufe, "Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis" , In Journal of Applied Network Science, vol. 1, no. 9, pp. 1–22, Sep 2016. [doi] [Bibtex & Downloads]
Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis
Reference
Benjamin Schiller, Clemens Deusser, Jeronimo Castrillon, Thorsten Strufe, "Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis" , In Journal of Applied Network Science, vol. 1, no. 9, pp. 1–22, Sep 2016. [doi]
Bibtex
@Article{schiller16_jans,
author = {Benjamin Schiller and Clemens Deusser and Jeronimo Castrillon and Thorsten Strufe},
title = {Compile- and Run-time Approaches for the Selection of Efficient Data Structures for Dynamic Graph Analysis},
journal = {Journal of Applied Network Science},
year = {2016},
volume = {1},
number = {9},
pages = {1--22},
month = sep,
doi = {10.1007/s41109-016-0011-2},
url= {http://dynamic-networks.org/publications/papers/papers/gds-dynamic.pdf}
}
Downloads
1607_Schiller_JANS [PDF]
Related Paths
HAEC, Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems" , In IEEE 5G Dresden Summit (invited talk), Sep 2016. [Bibtex & Downloads]
Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems
Reference
Jeronimo Castrillon, "Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems" , In IEEE 5G Dresden Summit (invited talk), Sep 2016.
Bibtex
@Misc{castrillon20165gsummit,
author = {Castrillon, Jeronimo},
title = {Compiling for Deeply Embedded and Heterogeneous Signal Processing Systems},
howpublished = {IEEE 5G Dresden Summit (invited talk)},
month = sep,
year = {2016},
location = {Dresden, Germany},
url= {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/160929_castrillon_5G-summit.pdf}
}
Downloads
160929_castrillon_5G-summit [PDF]
Permalink
- Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms" , In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi] [Bibtex & Downloads]
An Optimal Allocation of Memory Buffers for Complex Multicore Platforms
Reference
Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, Rainer Leupers, "An Optimal Allocation of Memory Buffers for Complex Multicore Platforms" , In Journal of Systems Architecture, Elsevier, vol. 66-67, pp. 69–83, May 2016. [doi]
Abstract
In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints. In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.
Bibtex
@Article{goens_jsa16,
Title={An Optimal Allocation of Memory Buffers for Complex Multicore Platforms},
Author={Goens, Andr\'{e}s and Castrillon, Jeronimo and Odendahl, Maximilian and Leupers, Rainer},
Journal={Journal of Systems Architecture},
volume={66-67},
pages={69--83},
doi={10.1016/j.sysarc.2016.05.002},
publisher={Elsevier},
Year={2016},
month=may,
abstract={In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.
In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear pro- gramming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Programming Heterogeneous Embedded Systems for IoT" , In Workshop get-togethers toward a sustainable collaboration in IoT (invited talk), Apr 2016. ([link]) [Bibtex & Downloads]
Programming Heterogeneous Embedded Systems for IoT
Reference
Jeronimo Castrillon, "Programming Heterogeneous Embedded Systems for IoT" , In Workshop get-togethers toward a sustainable collaboration in IoT (invited talk), Apr 2016. ([link])
Bibtex
@Misc{castrillon2016tunis,
author={Castrillon, Jeronimo},
title={Programming Heterogeneous Embedded Systems for IoT},
howpublished={Workshop get-togethers toward a sustainable collaboration in IoT (invited talk)},
month=apr,
year={2016},
location={Tunis, Tunisia},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/160418_castrillon_dataflow4IoT.pdf}
}
Downloads
160418_castrillon_dataflow4IoT [PDF]
Related Paths
Permalink
- Sven Karol, Norman A. Rink, Bálint Gyapjas, Jeronimo Castrillon, "Fault Tolerance with Aspects: a Feasibility Study" , Proceedings of the 15th International Conference on Modularity, ACM, pp. 66–69, New York, NY, USA, Mar 2016. [doi] [Bibtex & Downloads]
Fault Tolerance with Aspects: a Feasibility Study
Reference
Sven Karol, Norman A. Rink, Bálint Gyapjas, Jeronimo Castrillon, "Fault Tolerance with Aspects: a Feasibility Study" , Proceedings of the 15th International Conference on Modularity, ACM, pp. 66–69, New York, NY, USA, Mar 2016. [doi]
Bibtex
@inproceedings{karol2016faulttolerance,
author={Karol, Sven and Rink, Norman A. and Gyapjas, B\'{a}lint and Castrillon, Jeronimo},
title={Fault Tolerance with Aspects: a Feasibility Study},
booktitle={Proceedings of the 15th International Conference on Modularity},
series={MODULARITY 2016},
year={2016},
pages={66--69},
address={New York, NY, USA},
month={mar},
publisher={ACM},
doi={10.1145/2889443.2889453},
isbn={978-1-4503-3995-7/16/03},
location={M{\'a}laga, Spain},
}
Downloads
1603_Karol_Modularity_preprint [PDF]
Related Paths
Permalink
2015
- Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs" , In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim) , Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi] [Bibtex & Downloads]
Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs
Reference
Andrés Goens, Jeronimo Castrillon, "Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs" , In Proceeding: System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523 (Götz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aurélio and Al Faruque, Mohammad Abdullah and Rettberg, Achim) , Springer International Publishing, pp. 116–127, Foz do Iguaçu, Brazil, Nov 2015. [doi]
Abstract
Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.
Bibtex
@InProceedings{goens_iess15,
author = {Goens, Andr\'{e}s and Castrillon, Jeronimo},
title = {Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs},
booktitle = {System Level Design from HW/SW to Memory for Embedded Systems. IESS 2015. IFIP Advances in Information and Communication Technology, vol 523},
year = {2015},
editor = {G{\"o}tz, Marcelo and Schirner, Gunar and Wehrmeister, Marco Aur{\'e}lio and Al Faruque, Mohammad Abdullah and Rettberg, Achim},
pages = {116--127},
address = {Foz do Igua{\c{c}}u, Brazil},
month = nov,
publisher = {Springer International Publishing},
doi = {10.1007/978-3-319-90023-0_10},
url = {https://link.springer.com/chapter/10.1007%2F978-3-319-90023-0_10},
isbn={978-3-319-90023-0},
abstract = {Current approaches for mapping Kahn Process Networks (KPN) and Dynamic Data Flow (DDF) applications rely on assumptions on the program behavior specific to an execution. Thus, a near-optimal mapping, computed for a given input data set, may become sub-optimal at run-time. This happens when a different data set induces a significantly different behavior. We address this problem by leveraging inherent mathematical structures of the dataflow models and the hardware architectures. On the side of the dataflow models, we rely on the monoid structure of histories and traces. This structure help us formalize the behavior of multiple executions of a given dynamic application. By defining metrics we have a formal framework for comparing the executions. On the side of the hardware, we take advantage of symmetries in the architecture to reduce the search space for the mapping problem. We evaluate our implementation on execution variations of a randomly-generated KPN application and on a low-variation JPEG encoder benchmark. Using the described methods we show that trace differences are not sufficient for characterizing performance losses. Additionally, using platform symmetries we manage to reduce the design space in the experiments by two orders of magnitude.},
}
Downloads
1511_Goens_IESS [PDF]
Related Paths
Permalink
- Benjamin Schiller, Jeronimo Castrillon, Thorsten Strufe, "Efficient data structures for dynamic graph analysis" , Proceedings of the 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (Lisa O Conner) , IEEE Computer Society, pp. 497–504, Bangkok, Thailand, Nov 2015. [doi] [Bibtex & Downloads]
Efficient data structures for dynamic graph analysis
Reference
Benjamin Schiller, Jeronimo Castrillon, Thorsten Strufe, "Efficient data structures for dynamic graph analysis" , Proceedings of the 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (Lisa O Conner) , IEEE Computer Society, pp. 497–504, Bangkok, Thailand, Nov 2015. [doi]
Bibtex
@InProceedings{schiller_sitis15,
Title={Efficient data structures for dynamic graph analysis},
Author={Schiller, Benjamin and Castrillon, Jeronimo and Strufe, Thorsten},
Booktitle={Proceedings of the 11th International Conference on Signal-Image Technology \& Internet-Based Systems (SITIS)},
Year={2015},
Address={Bangkok, Thailand},
Editor={Lisa O'Conner},
Month=nov,
Publisher={IEEE Computer Society},
Series={SITIS 2015},
pages={497--504},
doi={10.1109/SITIS.2015.94}
}
Downloads
1511_Schiller_SITIS [PDF]
Related Paths
Orchestration Path, Resilience Path, HAEC
Permalink
- Norman A. Rink, Jeronimo Castrillon, "Improving Code Generation for Software-based Error Detection" , Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015, pp. 16–30, Amsterdam, The Netherlands, Oct 2015. ([link]) [Bibtex & Downloads]
Improving Code Generation for Software-based Error Detection
Reference
Norman A. Rink, Jeronimo Castrillon, "Improving Code Generation for Software-based Error Detection" , Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015, pp. 16–30, Amsterdam, The Netherlands, Oct 2015. ([link])
Bibtex
@InProceedings{rink_ress15,
Title={Improving Code Generation for Software-based Error Detection},
Author={Rink, Norman A. and Castrillon, Jeronimo},
Booktitle={Proceedings of the 1st International Workshop on Resiliency in Embedded Electronic Systems (REES), co-located with ESWEEK 2015},
Year={2015},
Series={REES 2015},
Address={Amsterdam, The Netherlands},
Month=oct,
Pages={16--30},
}
Downloads
1510_Rink_REES [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Analysis and software synthesis of KPN applications" , In Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk), Oct 2015. ([link]) [Bibtex & Downloads]
Analysis and software synthesis of KPN applications
Reference
Jeronimo Castrillon, "Analysis and software synthesis of KPN applications" , In Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk), Oct 2015. ([link])
Abstract
Programming models based on dataflow or process networks are a good match for streaming applications, common in the signal processing, multimedia and automotive domains. In such models, parallelism is expressed explicitly which makes them well-suited for programming parallel machines. Since today s applications are no longer static, expressive programming models are needed, such as those based on Kahn Process Networks (KPNs). In these models, tasks cannot be handled as black boxes, but have to be analyzed, profiled and traced to characterize their behavior. This is especially important in the case of heterogenous platforms with many processors of multiple different types. This presentation describes a tool flow to handle KPN applications and gives insights into mapping algorithms for heterogeneous platforms.
Bibtex
@Misc{castrillon15_dreams,
Title={Analysis and software synthesis of KPN applications},
Author={Jeronimo Castrillon},
HowPublished={Design of Robotics and Embedded systems, Analysis, and Modeling Seminar (DREAMS) (invited talk)},
Month=oct,
Year={2015},
Day={22},
Location={Berkeley, CA, USA},
Abstract={Programming models based on dataflow or process
networks are a good match for streaming
applications, common in the signal processing,
multimedia and automotive domains. In such models,
parallelism is expressed explicitly which makes
them well-suited for programming parallel
machines. Since today's applications are no
longer static, expressive programming models are
needed, such as those based on Kahn Process
Networks (KPNs). In these models, tasks cannot be
handled as black boxes, but have to be analyzed,
profiled and traced to characterize their
behavior. This is especially important in the case
of heterogenous platforms with many processors of
multiple different types. This presentation
describes a tool flow to handle KPN applications
and gives insights into mapping algorithms for
heterogeneous platforms.},
url={https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/151022_castrillon_dreams.pdf},
}
Downloads
151022_castrillon_dreams [PDF]
Related Paths
Orchestration Path, Resilience Path
Permalink
- Jeronimo Castrillon, "Dataflow programming for heterogeneous computing systems" , In Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT 15), Oct 2015. ([link]) [Bibtex & Downloads]
Dataflow programming for heterogeneous computing systems
Reference
Jeronimo Castrillon, "Dataflow programming for heterogeneous computing systems" , In Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT 15), Oct 2015. ([link])
Abstract
This tutorial talk starts by introducing new types of heterogeneous systems and their challenges for hardware/software programming stacks. These systems are currently being investigated in the context of the German cluster of excellence Cfaed – Center for Advancing Electronics Dresden . We will then look at dataflow modeling concepts, with emphasis on the dynamic models that are needed to express today s changing workloads. Finally, the talk will introduce methods and algorithms for mapping sets of applications modeled in this way to heterogeneous systems.
Bibtex
@Misc{castrillon15_pacttut,
Title={Dataflow programming for heterogeneous computing systems},
Author={Jeronimo Castrillon},
HowPublished={Tutorial Algorithmic Specification, Tools and Algorithms for Programming Heterogeneous Platforms. Co-located with the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT'15)},
Month=oct,
Year={2015},
Abstract={This tutorial talk starts by introducing new types of heterogeneous systems and their challenges for hardware/software programming stacks. These systems are currently being investigated in the context of the German cluster of excellence Cfaed – ''Center for Advancing Electronics Dresden''. We will then look at dataflow modeling concepts, with emphasis on the dynamic models that are needed to express today's changing workloads. Finally, the talk will introduce methods and algorithms for mapping sets of applications modeled in this way to heterogeneous systems.},
Day={18},
}
Downloads
151018_castrillon_dataflow_pacttut [PDF]
Related Paths
Permalink
- Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs" , Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link]) [Bibtex & Downloads]
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
Reference
Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger, "GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs" , Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), Sep 2015. ([link])
Abstract
In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.
Bibtex
@InProceedings{vogt15,
Title={GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs},
Author={Vogt , Markus and Hempel, Gerald and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP)},
Year={2015},
Month=sep,
Series={FSP 2015},
archivePrefix={arXiv},
arxivId={1509.00025},
eprint={1509.00025},
abstract={In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.},
}
Downloads
1509_Vogt_FSP [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Orchestration: Turning material breakthroughs into application performance" , In Dresden Microelectronics Academy, (invited talk), Sep 2015. [Bibtex & Downloads]
Orchestration: Turning material breakthroughs into application performance
Reference
Jeronimo Castrillon, "Orchestration: Turning material breakthroughs into application performance" , In Dresden Microelectronics Academy, (invited talk), Sep 2015.
Bibtex
@Misc{castrillon2015dma,
Title={Orchestration: Turning material breakthroughs into application performance},
Author={Castrillon, Jeronimo},
HowPublished={Dresden Microelectronics Academy, (invited talk)},
Month=sep,
Year={2015},
Location={Dresden, Germany},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150918_castrillon_dma.pdf}
}
Downloads
150918_castrillon_dma [PDF]
Related Paths
Permalink
- Norman A. Rink, Dmitrii Kuvaiskii, Jeronimo Castrillon, Christof Fetzer, "Compiling for Resilience: the Performance Gap" , Chapter in Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015) (Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer) , IOS Press, vol. 27, pp. 721–730, Edinburgh, Scotland, Sep 2015. [doi] [Bibtex & Downloads]
Compiling for Resilience: the Performance Gap
Reference
Norman A. Rink, Dmitrii Kuvaiskii, Jeronimo Castrillon, Christof Fetzer, "Compiling for Resilience: the Performance Gap" , Chapter in Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015) (Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer) , IOS Press, vol. 27, pp. 721–730, Edinburgh, Scotland, Sep 2015. [doi]
Abstract
In order to perform reliable computations on unreliable hardware, software-based protection mechanisms have been proposed. In this paper we present a compiler infrastructure for software-based code hardening based on encoding. We analyze the trade-off between performance and fault coverage. We look at different code generation strategies that improve the performance of hardened programs by up to 2x while incurring little fault coverage degradation.
Bibtex
@InCollection{rink_erpp2015,
author={Rink, Norman A. and Kuvaiskii, Dmitrii and Castrillon, Jeronimo and Fetzer, Christof},
title={Compiling for Resilience: the Performance Gap},
booktitle={Parallel Computing: On the Road to Exascale (ParCo 2015). Extended from Proceedings of the Mini-Symposium on Energy and Resilience in Parallel Programming (ERPP 2015)},
publisher={IOS Press},
year={2015},
editor={Gerhard R. Joubert and Hugh Leather and Mark Parsons and Frans Peters and Mark Sawyer},
volume={27},
series={ParCo 2015},
pages={721--730},
address={Edinburgh, Scotland},
month=sep,
abstract={In order to perform reliable computations on unreliable hardware, software-based protection mechanisms have been proposed. In this paper we present a compiler infrastructure for software-based code hardening based on encoding. We analyze the trade-off between performance and fault coverage. We look at different code generation strategies that improve the performance of hardened programs by up to 2x while incurring little fault coverage degradation.},
doi={10.3233/978-1-61499-621-7-721},
}
Downloads
No Downloads available for this publication
Related Paths
Orchestration Path, Resilience Path
Permalink
- Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs" , Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad) , SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015. [Bibtex & Downloads]
Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs
Reference
Gerald Hempel, Markus Vogt, Jeronimo Castrillon, Christian Hochberger, "Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs" , Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session (Grosspietsch, Erwin and Klöckner, Konrad) , SEA-Publications: SEA-SR-44, Funchal, Madeira (Portugal), August 2015.
Bibtex
@InProceedings{hempeldsd15,
Title={Software-Backed Caching and Virtual Addressing for Generated Accelerators in SoC FPGAs},
Author={Hempel, Gerald and Vogt, Markus and Castrillon, Jeronimo and Hochberger, Christian},
Booktitle={Proceedings of 41st EUROMICRO Conference on Software Engineering and Advanced Applications - Work in Progress Session},
Year={2015},
Address={Funchal, Madeira (Portugal)},
Editor={Grosspietsch, Erwin and Kl{\"o}ckner, Konrad},
Month={August},
Publisher={SEA-Publications: SEA-SR-44},
Series={DSD/SEAA 2015},
ISBN={978-3-902457-44-8}
}
Downloads
1508_Hempel_DSD [PDF]
Related Paths
Permalink
- Sven Karol, Pietro Incardona, Yaser Afshar, Ivo Sbalzarini, Jeronimo Castrillon, "Towards a Next-Generation Parallel Particle-Mesh Language" , Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI), pp. 15–18, Jul 2015. ([link]) [Bibtex & Downloads]
Towards a Next-Generation Parallel Particle-Mesh Language
Reference
Sven Karol, Pietro Incardona, Yaser Afshar, Ivo Sbalzarini, Jeronimo Castrillon, "Towards a Next-Generation Parallel Particle-Mesh Language" , Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI), pp. 15–18, Jul 2015. ([link])
Bibtex
@InProceedings{karol15,
Title={Towards a Next-Generation Parallel Particle-Mesh Language},
Author={Karol, Sven and Incardona, Pietro and Afshar, Yaser and Sbalzarini, Ivo and Castrillon, Jeronimo},
Booktitle={Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI)},
series={DSLDI'15},
Year={2015},
Month=jul,
pages={15--18},
}
Downloads
1507_Karol_PPML [PDF]
Related Paths
Permalink
- Diana Göhringer, Michael Hübner, Jeronimo Castrillon, Cristina Silvano, "ViPES 2015-Preface" , Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 347–347, Jul 2015. [doi] [Bibtex & Downloads]
ViPES 2015-Preface
Reference
Diana Göhringer, Michael Hübner, Jeronimo Castrillon, Cristina Silvano, "ViPES 2015-Preface" , Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 347–347, Jul 2015. [doi]
Bibtex
@InProceedings{gohringer2015vipes,
author={G{\"o}hringer, Diana and H{\"u}bner, Michael and Castrillon, Jeronimo and Silvano, Cristina},
title={ViPES 2015-Preface},
booktitle={Proceedings of the 15th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)},
year={2015},
pages={347--347},
organization={IEEE},
month=jul,
doi={10.1109/SAMOS.2015.7363696},
}
Downloads
No Downloads available for this publication
Permalink
- Jeronimo Castrillon, "Portable Libraries and Programming Environments" , In HiPEAC Computing Systems Week, (invited talk), May 2015. [Bibtex & Downloads]
Portable Libraries and Programming Environments
Reference
Jeronimo Castrillon, "Portable Libraries and Programming Environments" , In HiPEAC Computing Systems Week, (invited talk), May 2015.
Bibtex
@Misc{castrillon2015csw,
Title={Portable Libraries and Programming Environments},
Author={Castrillon, Jeronimo},
HowPublished={HiPEAC Computing Systems Week, (invited talk)},
Year={2015},
Month=may,
Location={Oslo, Noway},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150505_castrillon_csw.pdf}
}
Downloads
150505_castrillon_csw [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Lothar Thiele, Lars Schorr, Weihua Sheng, Ben Juurlink, Mauricio Alvarez-Mesa, Angela Pohl, Ralph Jessenberger, Victor Reyes, Rainer Leupers, "Multi/Many-core Programming: Where Are We Standing?" , Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), EDA Consortium, pp. 1708–1717, San Jose, CA, USA, Mar 2015. ([link]) [Bibtex & Downloads]
Multi/Many-core Programming: Where Are We Standing?
Reference
Jeronimo Castrillon, Lothar Thiele, Lars Schorr, Weihua Sheng, Ben Juurlink, Mauricio Alvarez-Mesa, Angela Pohl, Ralph Jessenberger, Victor Reyes, Rainer Leupers, "Multi/Many-core Programming: Where Are We Standing?" , Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), EDA Consortium, pp. 1708–1717, San Jose, CA, USA, Mar 2015. ([link])
Bibtex
@inproceedings{Castrillon:2015,
author={Castrillon, Jeronimo and Thiele, Lothar and Schorr, Lars and Sheng, Weihua and Juurlink, Ben and Alvarez-Mesa, Mauricio and Pohl, Angela and Jessenberger, Ralph and Reyes, Victor and Leupers, Rainer},
title={Multi/Many-core Programming: Where Are We Standing?},
booktitle={Proceedings of the 2015 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
series={DATE '15},
year={2015},
Month=mar,
location={Grenoble, France},
pages={1708--1717},
numpages={10},
acmid={2757208},
publisher={EDA Consortium},
address={San Jose, CA, USA},
Bdsk-url-1={http://dl.acm.org/citation.cfm?id=2757012.2757208},
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
- Jeronimo Castrillon, "Tools and dataflow-based programming models for heterogeneous MPSoCs" , In Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 15) in conjunction with the HiPEAC Conference (invited talk), Jan 2015. [Bibtex & Downloads]
Tools and dataflow-based programming models for heterogeneous MPSoCs
Reference
Jeronimo Castrillon, "Tools and dataflow-based programming models for heterogeneous MPSoCs" , In Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 15) in conjunction with the HiPEAC Conference (invited talk), Jan 2015.
Bibtex
@Misc{castrillon2015pegpum,
Title={Tools and dataflow-based programming models for heterogeneous MPSoCs},
Author={Castrillon, Jeronimo},
HowPublished={Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM'15) in conjunction with the HiPEAC Conference (invited talk)},
Year={2015},
Month=jan,
Location={Amsterdam, The Netherlands},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150121_castrillon_pegpum.pdf}
}
Downloads
150121_castrillon_pegpum [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, "Simulation and Estimation for MPSoC Programming Tools" , In Proceeding: Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO 15), in conjunction with the HiPEAC Conference (keynote), Jan 2015. [Bibtex & Downloads]
Simulation and Estimation for MPSoC Programming Tools
Reference
Jeronimo Castrillon, "Simulation and Estimation for MPSoC Programming Tools" , In Proceeding: Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO 15), in conjunction with the HiPEAC Conference (keynote), Jan 2015.
Bibtex
@InProceedings{castrillon2015rapido,
Title={Simulation and Estimation for MPSoC Programming Tools},
Author={Castrillon, Jeronimo},
Booktitle={Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO'15), in conjunction with the HiPEAC Conference (keynote)},
Year={2015},
Month=jan,
Location={Amsterdam, The Netherlands},
url = {https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/150121_castrillon_rapido.pdf}
}
Downloads
150121_castrillon_rapido [PDF]
Related Paths
Permalink
2014
- Jeronimo Castrillon, "Compiler Flow for Processors and Systems" , In Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk), Nov 2014. [Bibtex & Downloads]
Compiler Flow for Processors and Systems
Reference
Jeronimo Castrillon, "Compiler Flow for Processors and Systems" , In Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk), Nov 2014.
Bibtex
@Misc{castrillon2015tunis,
Title={Compiler Flow for Processors and Systems},
Author={Castrillon, Jeronimo},
HowPublished={Winter School on Design, Programming and Applications of Multi Processor System on Chip (invited talk)},
Year={2014},
Month=nov,
Location={Tunis, Tunisia},
url={https://cfaed.tu-dresden.de/files/user/jcastrillon/publications/141127_castrillon_compilers.pdf}
}
Downloads
141127_castrillon_compilers [PDF]
Related Paths
Permalink
- Jeronimo Castrillon, Rainer Leupers, "Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap" , Springer, pp. 258, 2014. ([link]) [Bibtex & Downloads]
Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap
Reference
Jeronimo Castrillon, Rainer Leupers, "Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap" , Springer, pp. 258, 2014. ([link])
Bibtex
@Book{castrillon14_springer,
Title={Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap},
Author={Castrillon, Jeronimo and Leupers, Rainer},
Publisher={Springer},
Year={2014},
ISBN={978-3-319-00675-8},
Pages={258}
}
Downloads
No Downloads available for this publication
Permalink
- Diandian Zhang, Jeronimo Castrillon, Stefan Schürmans, Gerd Ascheid, Rainer Leupers, and Bart Vanthournout, "System-Level Analysis of MPSoCs with a Hardware Scheduler" , Hershey: IGI Global, pp. 335–367, 2014. [doi] [Bibtex & Downloads]
System-Level Analysis of MPSoCs with a Hardware Scheduler
Reference
Diandian Zhang, Jeronimo Castrillon, Stefan Schürmans, Gerd Ascheid, Rainer Leupers, and Bart Vanthournout, "System-Level Analysis of MPSoCs with a Hardware Scheduler" , Hershey: IGI Global, pp. 335–367, 2014. [doi]
Bibtex
@InBook{zhang2014_inbook,
Title={System-Level Analysis of MPSoCs with a Hardware Scheduler},
Author={Diandian Zhang and Jeronimo Castrillon and Stefan Schürmans and Gerd Ascheid and Rainer Leupers and and Bart Vanthournout},
Chapter={Advancing Embedded Systems and Real-Time Communications with Emerging Technologies},
Editor={Seppo Virtanen},
Pages={335--367},
Publisher={Hershey: IGI Global},
doi={10.4018/978-1-4666-6034-2.ch014},
Year={2014},
}
Downloads
No Downloads available for this publication
Permalink