Seminar:
Towards Accessible Machine Learning via Unified, Explainable Model Compression and Performance Forecasting

When:
11:00 am
Tuesday November 12th, 2024

Where:
Room 3107
Patrick F. Taylor Hall

ABSTRACT

Machine learning has made tremendous progress in real-world tasks. However, the computational resources required limit model deployment in various scenarios. Faced with this issue, two solutions arise: Designing efficient neural network models and compressing existing ones. Neural Architecture Search (NAS) finds optimal neural networks from a search space. Performance predictors lower the NAS compute requirement but are limited by the constraints of search spaces and benchmark tasks. GENNAPE introduces a computational graph format that represents architectures from different search spaces, while AIO-P utilizes knowledge injection to expand the scope of predictors into real-world tasks. Further, AutoGO optimizes the primitive operation structure of existing architectures, maximizing the performance and hardware-friendliness in deployment scenarios. In contrast, model compression reduces the hardware costs of existing neural networks. We adapt performance predictors to compression by casting the set of all choices, e.g., quantization bit precision, as a search space. However, effective compression should identify which components are crucial to maintaining performance. Thus, we propose block profile sampling and AutoBuild, which identify the model design choices that contribute to performance and hardware-friendliness. This enables us to find pruned Stable Diffusion v1.4 variants. Further, Qua2SeDiMo identifies the individual weight layers that are sensitive to low-bit quantization, enabling us to construct sub 4-bit weight quantization schemes for many diffusion models. Finally, we discuss plans for obtaining neural networks that are optimally compressed at training time and accurate model performance forecasting. We root these predictions in knowledge of model architectures, compression sensitivity, hardware-friendliness and task statistics.

Chen Chen

Keith Mills

University of Alberta

Keith G. Mills is a Ph.D. Candidate at the Department of Electrical and Computer Engineering at the University of Alberta under the supervision of Professor Di Niu. He received his MSc. in Computer Engineering from the University of Alberta in 2020 and his BSc. in Computer Engineering, with Distinction, and also from the University of Alberta in 2018. He is the holder of the Alberta Innovates Graduate Student Scholarship and the William Boytzun Memorial Graduate Scholarship for 2024. His research interest is leveraging methods like Neural Architecture Search to explore the intersection of Efficient Machine Learning eXplainable AI (XAI).

Seminar:Towards Accessible Machine Learning via Unified, Explainable Model Compression and Performance Forecasting

ABSTRACT

Keith Mills

University of Alberta

Seminar:
Towards Accessible Machine Learning via Unified, Explainable Model Compression and Performance Forecasting