- Silicon Nanowire Path
- Carbon Path
- Organic / Polymer Path
- Biomolecular-Assembled Circuits
- Chemical Information Processing Path
- Orchestration Path
- Resilience Path
- CRC 912 (HAEC)
- Biological Systems Path
Path G - Resilience
Today, reliability issues already lead to diminishing performance returns when transitioning to smaller CMOS gate lengths. Soon the costs of traditional resilience mechanisms will cancel most of the benefits gained from transitioning to a new technology. The goal of the Resilience Path is to keep the costs of resilience as low as possible by focusing on flexible, application-specific, adaptive resiliency mechanisms. Reliable information processing with unreliable and adjustable components will be researched, taking into account the projected heterogeneity of future systems and the fault characteristics of new materials-inspired technologies.
Path Leader: Prof. Dr. Akash Kumar
Path Co-Leader: Prof. Dr. Thorsten Strufe
Photos: Katharina Knaut
- Prof. Dr. Uwe Aßmann
- Prof. Dr.-Ing. Franz Baader
- Prof. Dr. Christel Baier
- Dr. Pramod Bhatotia
- Prof. Dr. sc. techn. habil. Dipl. Betriebswissenschaften Frank Ellinger
- Prof. Dr.-Ing. Dr. h.c. Gerhard Fettweis
- Prof. Dr.-Ing. Frank Fitzek
- Prof. Dr. Christof Fetzer
- Prof. Dr. Hermann Härtig
- Prof. Dr.-Ing. Eduard Jorswieck
- Prof. Dr. Akash Kumar
- Prof. Dr.-Ing Wolfgang Lehner
- Prof. Dr. Wolfgang E. Nagel
- Prof. Dr.-Ing. habil. Christian Georg Mayr
- Prof. Dr. rer. nat. habil. Stefan Siegmund
- Prof. Dr. Thorsten Strufe
- Dr. Marco Zimmerling
It can be assumed that most post-CMOS technologies, such as the ones investigated in cfAED, will exhibit high error rates. In particular, not only the rate of single event upsets (e.g., bit flips) will increase, but also accelerated aging (e.g., transistor performance degradation) and transistor variability (e.g., threshold voltage). This will result in an increasing rate of transient and permanent errors. To mask these errors, we need to pay a cost in terms of energy, speed, and transistor count. We informally refer to this as the resilience cost. Extrapolating state-of-the-art approaches to future resilience needs, the cost of resilience will eventually prevent the use of new technology generations: the benefits of a new technology must exceed the cost increase of resilience.
The overall goal of the Resilience Path is to reduce the resilience cost. Depending on the context, different emphasis must be given to the costs of energy, speed, and transistor count. For example, a required balance between speed and energy differs widely between a mobile device and high performance servers. The Resilience Path is driven by the hypothesis that a sufficient cost reduction can be achieved by combining the best ideas that exist on different sub-layers. A system can be viewed as a layered system consisting of hardware and software sub-layers. A variety of ideas have been proposed to improve the resilience on each sub-layer. Hence, components that populate these sub-layers come with their own resilience mechanisms.
To achieve a substantial cost reduction, we not only need novel mechanisms but also to orchestrate these mechanisms in an intelligent way. Our general approach to reduce the cost is to dynamically adapt the degree of resilience to the current needs of the applications. Consider, e.g., a banking and a gaming application executed within a browser. The banking application needs to be optimized for integrity, and the gaming application for speed. To allow for such optimizations, we need to explicitly state the resilience requirements of an application. In the simplest case, an application will select its current resilience requirements from a set of pre-specified resilience classes. For more fine-grained control, we will investigate the use of resilience contracts: these contracts can be used to express dynamic resilience requirements negotiated and orchestrated between all sub-layers.
The overall goal of this Path is to reduce the cost of resilience. Our approach is based on the observation that the cost of resilience does not only depend on the error rate and types of the underlying technology but also on the resilience requirements and the inherent resilience of applications, possibly changing during runtime. Hence, our aim is to provide dynamic control of application resilience. In this way, we can orchestrate to only pay the cost of the currently needed degree of resilience. We will perform a dynamic cross-layer reconfiguration to tune the resilience mechanisms that are implemented on the various layers of a computer system. Dynamic resilience control will not only facilitate the adaptation regarding changing application requirements but also with respect to fluctuating error rates caused by, for example, environment changes or aging effects.
Our vision is to use the best resilience mechanisms on each sub-layer and combine them into one resilient computing stack as depicted in the scheme above. We also need to orchestrate these layers within one computer system and potentially, across multiple machines within distributed systems. This Path’s Research Modules are divided into horizontal “layers” (RM L1-4) and vertical “orchestration” modules (RM O1-O3). This Path integrates the expertise of the two new Strategic Professorships Processor Design and Compiler Construction, and of the new ZMDI endowed professorship Circuits for Energy Efficiency. An Research Group Leader (RGL) position Orchestration of Resilience Mechanisms will be created.
SGXBounds Paper Awarded Best Paper Award at EuroSys'17
Published on in RESILIENCE PATH
cfaed, TU Dresden @ EuroSys 2017
Published on in RESILIENCE PATH
Dr. Marco Zimmerling Wins 2015 ACM SIGBED Paul Caspi Memorial Dissertation Award
Published on in RESILIENCE PATH

Resilience Path: Paper to WWW 2016 Accepted
Published on in RESILIENCE PATH
cfaed Paper Accepted at INFOCOM 2016
Published on in RESILIENCE PATH
Inaugural lectures: Prof. Strufe & Prof. Castrillon
Published on in RESILIENCE PATH
Read more … Inaugural lectures: Prof. Strufe & Prof. Castrillon
cfaed Publications
Automatically tolerating arbitrary faults in non-malicious settings
Reference
Diogo Behrens, Stefan Weigert, Christof Fetzer, "Automatically tolerating arbitrary faults in non-malicious settings", In Proceeding: Dependable Computing (LADC), 2013 Sixth Latin-American Symposium on, pp. 114–123, 2013. [doi]
Bibtex
title={Automatically tolerating arbitrary faults in non-malicious settings},
author={Behrens, Diogo and Weigert, Stefan and Fetzer, Christof},
booktitle={Dependable Computing (LADC), 2013 Sixth Latin-American Symposium on},
pages={114--123},
year={2013},
organization={IEEE},
doi={10.1109/LADC.2013.26}
}
Downloads
No Downloads available for this publication
Related Paths
Permalink
https://cfaed.tu-dresden.de/resilience?pubId=85