João Paulo Cardoso de Lima

E-mail

Phone

Visitor's Address

joao.lima@tu-dresden.de

+49 (0)351 463 42336

Helmholtzstrasse 18,3rd floor, BAR III59

01069 Dresden
Germany

Curriculum Vitae

João Paulo received his bachelor's degree in Computer Engineering from the Federal University of Santa Catarina (UFSC) in April 2017, and his master's degree in Computer Science from the Federal University of Rio Grande do Sul (UFRGS) in February 2019. In July 2022, he joined the Chair for Compiler Construction to research and develop code optimizations for emerging artificial intelligence systems in the context of ScaDS.AI Dresden/Leipzig center.

Student Thesis Topics

My research interests focus on advancing the field of energy-efficient and high-performance computing through innovative approaches like computing-near-memory (CNM) and computing-in-memory (CIM), especially for machine learning (ML) and data analytics applications. I also focus on optimizing ML models for energy efficiency, which is essential for both IoT devices and data centres, where energy use is a growing concern. I can help you with these topics for project work or Bachelor/Master's thesis, especially for those interested in hardware-software co-design, energy-efficient ML, and emerging computing paradigms.

System and Compiler Design for Emerging CNM/CIM Architectures

Our goal is to enable the portability of AI and Big Data applications across existing CNM/CIM systems and novel accelerator designs, prioritizing performance, accuracy, and energy efficiency. Given the substantial differences compared to conventional machines, new compiler abstractions and frameworks are crucial to fully exploit the potential of CIM by providing automatic device-aware and device-agnostic optimizations and facilitating widespread adoption. Visit the ScaDS-AI website for a more detailed description of this project.

Model and Code Optimization Methods for Energy-efficient Machine Learning

Optimizing machine learning models is essential for improving performance and energy efficiency, especially given the resource constraints in IoT devices and the rising energy demands of data centres. Our research focuses on post-training analysis, conversion techniques, and code optimizations to reduce model size and computational complexity without compromising accuracy. Our efforts have focused on quantization, pruning, and bitslicing methods to boost alternative execution models and design approaches, aiming at faster and more energy-efficient inference tasks. You will find details of this project on the ScaDS-AI website.

Publications

2025
Yun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich, "Modeling and Simulating Emerging Memory Technologies: A Tutorial", Feb 2025. [Bibtex & Downloads]

Modeling and Simulating Emerging Memory Technologies: A Tutorial

Reference

Yun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich, "Modeling and Simulating Emerging Memory Technologies: A Tutorial", Feb 2025.

Bibtex

@Article{chen2025_sppsim,
author = {Yun-Chih Chen and Tristan Seidl and Nils Hölscher and Christian Hakert and Minh Duy Truong and Jian-Jia Chen and João Paulo C. de Lima and Asif Ali Khan and Jeronimo Castrillon and Ali Nezhadi and Lokesh Siddhu and Hassan Nassar and Mahta Mayahinia and Mehdi Baradaran Tahoori and Jörg Henkel and Nils Wilbert and Stefan Wildermann and Jürgen Teich},
title = {Modeling and Simulating Emerging Memory Technologies: A Tutorial},
eprint = {2502.10167},
url = {https://arxiv.org/abs/2502.10167},
archiveprefix = {arXiv},
primaryclass = {cs.AR},
year = {2025},
month = feb,
}

Downloads

2502_Chen_SPPSim [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3815

×

2024
João Paulo C. de Lima, Benjamin F. Morris III, Asif Ali Khan, Jeronimo Castrillon, Alex K. Jones, "Count2Multiply: Reliable In-memory High-Radix Counting", Arxiv, pp. 1-14, Sep 2024. [Bibtex & Downloads]

Count2Multiply: Reliable In-memory High-Radix Counting

Reference

João Paulo C. de Lima, Benjamin F. Morris III, Asif Ali Khan, Jeronimo Castrillon, Alex K. Jones, "Count2Multiply: Reliable In-memory High-Radix Counting", Arxiv, pp. 1-14, Sep 2024.

Abstract
Big data processing has exposed the limits of compute-centric hardware acceleration due to the memory-to-processor bandwidth bottleneck. Consequently, there has been a shift towards memory-centric architectures, leveraging substantial compute parallelism by processing using the memory elements directly. Computing-in-memory (CIM) proposals for both conventional and emerging memory technologies often target massively parallel operations. However, current CIM solutions face significant challenges. For emerging data-intensive applications, such as advanced machine learning techniques and bioinformatics, where matrix multiplication is a key primitive, memristor crossbars suffer from limited write endurance and expensive write operations. In contrast, while DRAM-based solutions have successfully demonstrated multiplication using additions, they remain prohibitively slow. This paper introduces Count2Multiply, a technology-agnostic digital-CIM method for performing integer-binary and integer-integer matrix multiplications using high-radix, massively parallel counting implemented with bitwise logic operations. In addition, Count2Multiply is designed with fault tolerance in mind and leverages traditional scalable row-wise error correction codes, such as Hamming and BCH codes, to protect against the high error rates of existing CIM designs. We demonstrate Count2Multiply with a detailed application to CIM in conventional DRAM due to its ubiquity and high endurance. We also explore the acceleration potential of racetrack memories due to their shifting properties, which are natural for Count2Multiply, and their high endurance. Compared to the state-of-the-art in-DRAM method, Count2Multiply achieves up to 10x speedup, 3.8x higher GOPS/Watt, and 1.4x higher GOPS/area, while the RTM counterpart offers gains of 10x, 57x, and 3.8x.

Bibtex

@Misc{delima_count2multiply,
author = {Jo{\~a}o Paulo C. de Lima and Benjamin F. Morris III and Asif Ali Khan and Jeronimo Castrillon and Alex K. Jones},
title = {Count2Multiply: Reliable In-memory High-Radix Counting},
pages = {1-14},
publisher = {Arxiv},
month=sep,
year={2024},
eprint={2409.10136},
archivePrefix={arXiv},
primaryClass={cs.AR},
url={https://arxiv.org/abs/2409.10136},
abstract = {Big data processing has exposed the limits of compute-centric hardware acceleration due to the memory-to-processor bandwidth bottleneck. Consequently, there has been a shift towards memory-centric architectures, leveraging substantial compute parallelism by processing using the memory elements directly. Computing-in-memory (CIM) proposals for both conventional and emerging memory technologies often target massively parallel operations. However, current CIM solutions face significant challenges. For emerging data-intensive applications, such as advanced machine learning techniques and bioinformatics, where matrix multiplication is a key primitive, memristor crossbars suffer from limited write endurance and expensive write operations. In contrast, while DRAM-based solutions have successfully demonstrated multiplication using additions, they remain prohibitively slow. This paper introduces Count2Multiply, a technology-agnostic digital-CIM method for performing integer-binary and integer-integer matrix multiplications using high-radix, massively parallel counting implemented with bitwise logic operations. In addition, Count2Multiply is designed with fault tolerance in mind and leverages traditional scalable row-wise error correction codes, such as Hamming and BCH codes, to protect against the high error rates of existing CIM designs. We demonstrate Count2Multiply with a detailed application to CIM in conventional DRAM due to its ubiquity and high endurance. We also explore the acceleration potential of racetrack memories due to their shifting properties, which are natural for Count2Multiply, and their high endurance. Compared to the state-of-the-art in-DRAM method, Count2Multiply achieves up to 10x speedup, 3.8x higher GOPS/Watt, and 1.4x higher GOPS/area, while the RTM counterpart offers gains of 10x, 57x, and 3.8x.},
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3786

×
João Paulo C. de Lima, Asif Ali Khan, Luigi Carro, Jeronimo Castrillon, "Full-Stack Optimization for CAM-Only DNN Inference", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1-6, Mar 2024. [Bibtex & Downloads]

Full-Stack Optimization for CAM-Only DNN Inference

Reference

João Paulo C. de Lima, Asif Ali Khan, Luigi Carro, Jeronimo Castrillon, "Full-Stack Optimization for CAM-Only DNN Inference", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1-6, Mar 2024.

Abstract
The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von-Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. This is because, even in CIM systems, data movement and processing still require considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing the arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy

Bibtex

@InProceedings{delima_date24,
author = {Jo{\~a}o Paulo C. de Lima and Asif Ali Khan and Luigi Carro and Jeronimo Castrillon},
booktitle = {Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE)},
title = {Full-Stack Optimization for CAM-Only DNN Inference},
location = {Valencia, Spain},
pages = {1-6},
publisher = {IEEE},
series = {DATE'24},
url = {https://ieeexplore.ieee.org/document/10546805},
abstract = {The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von-Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. This is because, even in CIM systems, data movement and processing still require considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing the arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy},
month = mar,
year = {2024},
}

Downloads

2403_deLima_DATE [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3701

×
Michael Niemier, Zephan Enciso, Mohammad Mehdi Sharifi, X. Sharon Hu, Ian O'Connor, Alexander Graening, Ravit Sharma, Puneet Gupta, Jeronimo Castrillon, João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Nashrah Afroze, Asif Islam Khan, Julien Ryckaert, "Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1–10, Mar 2024. [Bibtex & Downloads]

Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers

Reference

Michael Niemier, Zephan Enciso, Mohammad Mehdi Sharifi, X. Sharon Hu, Ian O'Connor, Alexander Graening, Ravit Sharma, Puneet Gupta, Jeronimo Castrillon, João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Nashrah Afroze, Asif Islam Khan, Julien Ryckaert, "Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1–10, Mar 2024.

Bibtex

@InProceedings{niemier_date24,
author = {Michael Niemier and Zephan Enciso and Mohammad Mehdi Sharifi and X. Sharon Hu and Ian O'Connor and Alexander Graening and Ravit Sharma and Puneet Gupta and Jeronimo Castrillon and João Paulo C. de Lima and Asif Ali Khan and Hamid Farzaneh and Nashrah Afroze and Asif Islam Khan and Julien Ryckaert},
booktitle = {Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE)},
title = {Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers},
location = {Valencia, Spain},
url = {https://ieeexplore.ieee.org/document/10546772},
pages = {1--10},
publisher = {IEEE},
series = {DATE'24},
month = mar,
year = {2024},
}

Downloads

2403_Niemier_DATE [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3715

×
Asif Ali Khan, João Paulo C. De Lima, Hamid Farzaneh, Jeronimo Castrillon, "The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview", Jan 2024. [Bibtex & Downloads]

The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview

Reference

Asif Ali Khan, João Paulo C. De Lima, Hamid Farzaneh, Jeronimo Castrillon, "The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview", Jan 2024.

Bibtex

@Report{khan_cimlandscape_2024,
author = {Asif Ali Khan and João Paulo C. De Lima and Hamid Farzaneh and Jeronimo Castrillon},
title = {The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview},
eprint = {2401.14428},
url = {https://arxiv.org/abs/2401.14428},
archiveprefix = {arXiv},
month = jan,
primaryclass = {cs.AR},
year = {2024},
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3716

×

2023
Jörg Henkel, Lokesh Siddhu, Lars Bauer, Jürgen Teich, Stefan Wildermann, Mehdi Tahoori, Mahta Mayahinia, Jeronimo Castrillon, Asif Ali Khan, Hamid Farzaneh, João Paulo C. de Lima, Jian-Jia Chen, Christian Hakert, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, "Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications", Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), pp. 11–20, Sep 2023. [doi] [Bibtex & Downloads]

Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications

Reference

Jörg Henkel, Lokesh Siddhu, Lars Bauer, Jürgen Teich, Stefan Wildermann, Mehdi Tahoori, Mahta Mayahinia, Jeronimo Castrillon, Asif Ali Khan, Hamid Farzaneh, João Paulo C. de Lima, Jian-Jia Chen, Christian Hakert, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, "Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications", Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), pp. 11–20, Sep 2023. [doi]

Abstract
This paper explores the challenges and opportunities of integrating non-volatile memories (NVMs) into embedded systems for machine learning. NVMs offer advantages such as increased memory density, lower power consumption, non-volatility, and compute-in- memory capabilities. The paper focuses on integrating NVMs into embedded systems, particularly in intermittent computing, where systems operate during periods of available energy. NVM technologies bring persistence closer to the CPU core, enabling efficient designs for energy-constrained scenarios. Next, computation in resistive NVMs is explored, highlighting its potential for accelerating machine learning algorithms. However, challenges related to reliability and device non-idealities need to be addressed. The paper also discusses memory-centric machine learning, leveraging NVMs to overcome the memory wall challenge. By optimizing memory layouts and utilizing probabilistic decision tree execution and neural network sparsity, NVM-based systems can improve cache behavior and reduce unnecessary computations. In conclusion, the paper emphasizes the need for further research and optimization for the widespread adoption of NVMs in embedded systems presenting relevant challenges, especially for machine learning applications.

Bibtex

@InProceedings{henkel_cases23,
author = {J\"{o}rg Henkel and Lokesh Siddhu and Lars Bauer and J\"{u}rgen Teich and Stefan Wildermann and Mehdi Tahoori and Mahta Mayahinia and Jeronimo Castrillon and Asif Ali Khan and Hamid Farzaneh and Jo\~{a}o Paulo C. de Lima and Jian-Jia Chen and Christian Hakert and Kuan-Hsun Chen and Chia-Lin Yang and Hsiang-Yun Cheng},
booktitle = {Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES)},
title = {Special Session -- Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications},
location = {Hamburg, Germany},
abstract = {This paper explores the challenges and opportunities of integrating non-volatile memories (NVMs) into embedded systems for machine learning. NVMs offer advantages such as increased memory density, lower power consumption, non-volatility, and compute-in- memory capabilities. The paper focuses on integrating NVMs into embedded systems, particularly in intermittent computing, where systems operate during periods of available energy. NVM technologies bring persistence closer to the CPU core, enabling efficient designs for energy-constrained scenarios. Next, computation in resistive NVMs is explored, highlighting its potential for accelerating machine learning algorithms. However, challenges related to reliability and device non-idealities need to be addressed. The paper also discusses memory-centric machine learning, leveraging NVMs to overcome the memory wall challenge. By optimizing memory layouts and utilizing probabilistic decision tree execution and neural network sparsity, NVM-based systems can improve cache behavior and reduce unnecessary computations. In conclusion, the paper emphasizes the need for further research and optimization for the widespread adoption of NVMs in embedded systems presenting relevant challenges, especially for machine learning applications.},
pages = {11--20},
url = {https://ieeexplore.ieee.org/abstract/document/10316216},
doi = {10.1145/3607889.3609088},
isbn = {9798400702907},
series = {CASES '23 Companion},
issn = {2643-1726},
month = sep,
numpages = {10},
year = {2023},
}

Downloads

2309_Henkel_CASES [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3654

×
João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Jeronimo Castrillon, "Efficient Associative Processing with RTM-TCAMs", In Proceeding: 1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23), 2pp, Jul 2023. [Bibtex & Downloads]

Efficient Associative Processing with RTM-TCAMs

Reference

João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Jeronimo Castrillon, "Efficient Associative Processing with RTM-TCAMs", In Proceeding: 1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23), 2pp, Jul 2023.

Bibtex

@InProceedings{lima_imacaw23,
author = {Jo{\~a}o Paulo C. de Lima and Asif Ali Khan and Hamid Farzaneh and Jeronimo Castrillon},
booktitle = {1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23)},
title = {Efficient Associative Processing with RTM-TCAMs},
location = {San Francisco, CA, USA},
pages = {2pp},
month = jul,
year = {2023},
}

Downloads

2307_deLima_iMACAW [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3566

×

2022
Rafael Fão de Moura, João Paulo Cardoso de Lima, Luigi Carro, "Data and Computation Reuse in CNNs using Memristor TCAMs", In ACM Transactions on Reconfigurable Technology and Systems, Association for Computing Machinery (ACM), Jul 2022. [doi] [Bibtex & Downloads]

Data and Computation Reuse in CNNs using Memristor TCAMs

Reference

Rafael Fão de Moura, João Paulo Cardoso de Lima, Luigi Carro, "Data and Computation Reuse in CNNs using Memristor TCAMs", In ACM Transactions on Reconfigurable Technology and Systems, Association for Computing Machinery (ACM), Jul 2022. [doi]

Bibtex

@article{de_Moura_2022,
doi = {10.1145/3549536},
url = {https://doi.org/10.1145%2F3549536},
year = 2022,
month = {jul},
publisher = {Association for Computing Machinery ({ACM})},
author = {Rafael Fao de Moura and Joao Paulo Cardoso de Lima and Luigi Carro},
title = {Data and Computation Reuse in {CNNs} using Memristor {TCAMs}},
journal = {{ACM} Transactions on Reconfigurable Technology and Systems}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3378

×
João Paulo Cardoso de Lima, Marcelo Brandalero, Michael Hübner, Luigi Carro, "STAP: An Architecture and Design Tool for Automata Processing on Memristor TCAMs", In ACM Journal on Emerging Technologies in Computing Systems, Association for Computing Machinery (ACM), vol. 18, no. 2, pp. 1–22, Apr 2022. [doi] [Bibtex & Downloads]

STAP: An Architecture and Design Tool for Automata Processing on Memristor TCAMs

Reference

João Paulo Cardoso de Lima, Marcelo Brandalero, Michael Hübner, Luigi Carro, "STAP: An Architecture and Design Tool for Automata Processing on Memristor TCAMs", In ACM Journal on Emerging Technologies in Computing Systems, Association for Computing Machinery (ACM), vol. 18, no. 2, pp. 1–22, Apr 2022. [doi]

Bibtex

@article{de_Lima_2022,
doi = {10.1145/3450769},
url = {https://doi.org/10.1145%2F3450769},
year = 2022,
month = {apr},
publisher = {Association for Computing Machinery ({ACM})},
volume = {18},
number = {2},
pages = {1--22},
author = {Jo{\~{a}}o Paulo Cardoso de Lima and Marcelo Brandalero and Michael Hübner and Luigi Carro},
title = {{STAP}: An Architecture and Design Tool for Automata Processing on Memristor {TCAMs}},
journal = {{ACM} Journal on Emerging Technologies in Computing Systems}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3380

×
Joao Paulo C. de Lima, Luigi Carro, "Quantization-Aware In-situ Training for Reliable and Accurate Edge AI", In Proceeding: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, Mar 2022. [doi] [Bibtex & Downloads]

Quantization-Aware In-situ Training for Reliable and Accurate Edge AI

Reference

Joao Paulo C. de Lima, Luigi Carro, "Quantization-Aware In-situ Training for Reliable and Accurate Edge AI", In Proceeding: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, Mar 2022. [doi]

Bibtex

@inproceedings{de_Lima_2022,
doi = {10.23919/date54114.2022.9774657},
url = {https://doi.org/10.23919%2Fdate54114.2022.9774657},
year = 2022,
month = {mar},
publisher = ,
author = {Joao Paulo C. de Lima and Luigi Carro},
title = {Quantization-Aware In-situ Training for Reliable and Accurate Edge {AI}},
booktitle = {2022 Design, Automation {\&} Test in Europe Conference {\&} Exhibition ({DATE})}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3377

×

2021
Paulo C. Santos, João P. C. de Lima, Rafael F. de Moura, Marco A. Z. Alves, Antonio C. S. Beck, Luigi Carro, "Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions", In International Journal of Parallel Programming, Springer Science and Business Media LLC, vol. 49, no. 2, pp. 237–252, Jan 2021. [doi] [Bibtex & Downloads]

Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions

Reference

Paulo C. Santos, João P. C. de Lima, Rafael F. de Moura, Marco A. Z. Alves, Antonio C. S. Beck, Luigi Carro, "Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions", In International Journal of Parallel Programming, Springer Science and Business Media LLC, vol. 49, no. 2, pp. 237–252, Jan 2021. [doi]

Bibtex

@article{Santos_2021,
doi = {10.1007/s10766-020-00674-y},
url = {https://doi.org/10.1007%2Fs10766-020-00674-y},
year = 2021,
month = {jan},
publisher = {Springer Science and Business Media {LLC}},
volume = {49},
number = {2},
pages = {237--252},
author = {Paulo C. Santos and Jo{\~{a}}o P. C. de Lima and Rafael F. de Moura and Marco A. Z. Alves and Antonio C. S. Beck and Luigi Carro},
title = {Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions},
journal = {International Journal of Parallel Programming}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3381

×

2020
Joao Paulo Cardoso de Lima, Marcelo Brandalero, Luigi Carro, "Endurance-Aware RRAM-Based Reconfigurable Architecture using TCAM Arrays", In Proceeding: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE, Aug 2020. [doi] [Bibtex & Downloads]

Endurance-Aware RRAM-Based Reconfigurable Architecture using TCAM Arrays

Reference

Joao Paulo Cardoso de Lima, Marcelo Brandalero, Luigi Carro, "Endurance-Aware RRAM-Based Reconfigurable Architecture using TCAM Arrays", In Proceeding: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE, Aug 2020. [doi]

Bibtex

@inproceedings{Cardoso_de_Lima_2020,
doi = {10.1109/fpl50879.2020.00018},
url = {https://doi.org/10.1109%2Ffpl50879.2020.00018},
year = 2020,
month = {aug},
publisher = ,
author = {Joao Paulo Cardoso de Lima and Marcelo Brandalero and Luigi Carro},
title = {Endurance-Aware {RRAM}-Based Reconfigurable Architecture using {TCAM} Arrays},
booktitle = {2020 30th International Conference on Field-Programmable Logic and Applications ({FPL})}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3379

×

2019
Hameeza Ahmed, Paulo C. Santos, Joao P. C. Lima, Rafael F. Moura, Marco A. Z. Alves, Antonio C. S. Beck, Luigi Carro, "A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions", In Proceeding: 2019 Design, Automation &amp$\mathsemicolon$ Test in Europe Conference &amp$\mathsemicolon$ Exhibition (DATE), IEEE, Mar 2019. [doi] [Bibtex & Downloads]

A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions

Reference

Hameeza Ahmed, Paulo C. Santos, Joao P. C. Lima, Rafael F. Moura, Marco A. Z. Alves, Antonio C. S. Beck, Luigi Carro, "A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions", In Proceeding: 2019 Design, Automation &amp$\mathsemicolon$ Test in Europe Conference &amp$\mathsemicolon$ Exhibition (DATE), IEEE, Mar 2019. [doi]

Bibtex

@inproceedings{Ahmed_2019,
doi = {10.23919/date.2019.8714956},
url = {https://doi.org/10.23919%2Fdate.2019.8714956},
year = 2019,
month = {mar},
publisher = ,
author = {Hameeza Ahmed and Paulo C. Santos and Joao P. C. Lima and Rafael F. Moura and Marco A. Z. Alves and Antonio C. S. Beck and Luigi Carro},
title = {A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions},
booktitle = {2019 Design, Automation {\&}amp$\mathsemicolon$ Test in Europe Conference {\&}amp$\mathsemicolon$ Exhibition ({DATE})}
}

Downloads

No Downloads available for this publication

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3382

×

João Paulo Cardoso de Lima

System and Compiler Design for Emerging CNM/CIM Architectures

Model and Code Optimization Methods for Energy-efficient Machine Learning

2025

2024

2023

2022

2021

2020

2019