cfaed Publications

An online guided tuning approach to run CNN pipelines on edge devices

Reference

Pirah Noor Soomro, Mustafa Abduljabbar, Jeronimo Castrillon, Miquel Pericás, "An online guided tuning approach to run CNN pipelines on edge devices", Proceedings of the 18th ACM International Conference on Computing Frontiers (CF'21), Association for Computing Machinery (ACM), pp. 45–53, New York, NY, USA, May 2021. [doi]

Abstract

Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.

Bibtex

@InProceedings{soomro_cf21,
author = {Pirah Noor Soomro and Mustafa Abduljabbar and Jeronimo Castrillon and Miquel Peric{\'a}s},
booktitle = {Proceedings of the 18th ACM International Conference on Computing Frontiers (CF'21)},
title = {An online guided tuning approach to run CNN pipelines on edge devices},
doi = {10.1145/3457388.3458662},
isbn = {9781450384049},
location = {Virtual Event, Italy},
pages = {45–53},
publisher = {Association for Computing Machinery (ACM)},
series = {CF '21},
url = {https://doi.org/10.1145/3457388.3458662},
abstract = {Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79\% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.},
address = {New York, NY, USA},
month = may,
numpages = {9},
year = {2021},
}

Downloads

2105_Soomro-CF [PDF]

Permalink

https://cfaed.tu-dresden.de/publications?pubId=3020


Go back to publications list