Rasmus Ørtoft Aagaard

DTU & Laerdal Medical

Billede af oplægsholder Rasmus Ørtoft Aagaard

Smaller and Faster AI: A Primer in Model Compression

As AI models grow in size and complexity their hardware requirements increase accordingly. State of the Art models from frontier labs require enormous data centers and dubious (at best) data collection strategies. However, there is a parallel track that often gets overlooked. As the AI capability ceiling is raised, the floor rises with it. This means that the baseline performance for smaller models has seen rapid growth, enabling otherwise infeasible use-cases such as local, offline AI on mobile devices. 

One research area dedicated to producing smaller and more efficient models is model compression: The act of starting with a larger and highly capable model and using that to produce a smaller model with similar capabilities. Model compression techniques such as quantization, pruning and knowledge distillation have become essential to applying AI models. This presentation explains core concepts of model compression and showcase that they are necessary and important for the AI adoption we are currently seeing.

Bio: I'm an industrial PhD Student at the Technical University of Denmark (DTU) and Laerdal Medical. My research is on using model compression to allow for deployment of AI models on mobile devices, especially targeting areas where internet connectivity is constrained or challenged or where data privacy is a determining factor.