Mathieu Léonardon

Associate Professor in Electronics

IMT Atlantique

Biography

I conduct research at the IMT Atlantique in Brest on hardware and software implementations of signal processing and AI algorithms. I teach computer engineering and digital electronics.

My PhD thesis focused on the implementation of polar codes decoders. I proposed the fastest software implementation of the Adaptive SC List decoding algorithm to date. This implementation is integrated in the AFF3CT toolbox to which I actively contribute.

I currently focus on efficient hardware and software implementations of Neural Networks, aiming at low latency and energy efficiency, through multiple industrial collaborations, and as the coordinator of a JCJC ANR project, ProPruNN. We also recently won the AMD Open Hardware Competition with the PEFSL project, a pipeline for the training, compilation, hardware synthesis and deployment of a few-shot learning application on an FPGA SoC.

Interests

Neural Networks Compression
Embedded Electronics
Channel Coding
HPC

Education

PhD in Electronics, 2018
Polytechnique Montréal
PhD in Electronics, 2018
University of Bordeaux
MEng in Embedded Electronics, 2015
Enseirb-Matmeca, Bordeaux INP

PEFSL

A Pipeline for Embedded Few-Shot Learning

A modular pipeline for the training, compilation, hardware synthesis and deployment of a few-shot learning application on an FPGA SoC.

Source code on GitHub

AFF3CT

A Fast Forward Error Correction Toolbox

Simulate high-throughput communication chains.

Source code on GitHub Website

Latest release

Featured Publications

Hamoud Younes, Hugo Le Blevec, Mathieu Léonardon, Vincent Gripon

September 2022 SYSINT 2022: International Conference on System-Integrated Intelligence

Inter-Operability of Compression Techniques for Efficient Deployment of CNNs on Microcontrollers

Machine Learning (ML) has become state of the art for various tasks, including classification of accelerometer data. In the world of Internet of Things (IoT), the available hardware with low-power consumption is often microcontrollers. However, one of the challenges for embedding machine learning on microcontrollers is that the available memory space is very limited, and this memory is also occupied by the rest of the software elements needed in the IoT device. The problem is then to design ML architectures that have a very low memory footprint, while maintaining a low error rate. In this paper, a methodology is proposed towards the deployment of efficient machine learning on microcontrollers. Then, such methodology is used to investigate the effect of using compression techniques mainly pruning, quantization, and coding on the memory budget. Indeed, we know that these techniques reduce the model size, but not how these techniques interoperate to reach the best accuracy to memory trade-off. A Convolutional Neural Network (CNN) and a Human Activity Recognition (HAR) application has been adopted for the validation of the study .

DOI

Hugo Tessier, Vincent Gripon, Mathieu Leonardon, Matthieu Arzel, Thomas Hannagan, David Bertrand

March 2022 Journal of Imaging (MDPI)

Rethinking Weight Decay for Efficient Neural Network Pruning

Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.

PDF DOI

Mathieu Leonardon, Adrien Cassagne, Camille Leroux, Christophe Jego, Louis-Philippe Hamelin, Yvon Savaria

January 2019 Journal of Signal Processing Systems (Springer)

Fast and Flexible Software Polar List Decoders

Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancellation List (SCL) decoder. A large set of parameters can be fine-tuned dynamically without re-compiling the software source code: the code length, the code rate, the frozen bits set, the puncturing patterns, the cyclic redundancy check, the list size, the type of decoding algorithm, the tree-pruning strategy and the data quantization. This generic and flexible SCL decoder enables to explore tradeoffs between throughput, latency and decoding performance. Several optimizations are proposed to achieve a competitive decoding speed despite the constraints induced by the genericity and the flexibility. The resulting polar list decoder is about 4 times faster than a generic software decoder and only 2 times slower than a non-flexible unrolled decoder. Thanks to the flexibility of the decoder, the fully adaptive SCL algorithm can be easily implemented and achieves higher throughput than any other similar decoder in the literature (up to 425 Mb/s on a single processor core for N = 2048 and K = 1723 at 4.5 dB).

PDF DOI

Mathieu Leonardon, Camille Leroux, Pekka Jaaskelainen, Christophe Jego, Yvon Savaria

December 2018 2018 IEEE 10th International Symposium on Turbo Codes & Iterative Information Processing (ISTC)

Transport Triggered Polar Decoders

In this paper, the first transport triggered architecture (TTA) customized for the decoding of polar codes is proposed. A first version of this programmable processor is optimized for the successive cancellation (SC) decoding of polar codes while a second architecture is further specialized to also support Soft CANcellation (SCAN) decoding. Both architectures were fully validated on FPGA device by prototyping. The first architecture was also synthesized in 28nm ASIC technology. It runs at a frequency of 800 MHz and reaches a throughput of 352 Mbps for a (1024, 512) polar code decoded with the SC algorithm. Compared to previous work, the energy consumption is reduced by one order of magnitude (0.14 nJ / bit) and the throughput is increased fivefold. Compared to an optimized software implementation on a general purpose processor (x86 architecture), the throughput is 37 % higher and the energy consumption is two orders of magnitude lower. TTA can be seen as a way to reduce the gap between programmable and dedicated polar decoders.

PDF DOI