Seminar:
Towards Accessible Machine Learning via Unified, Explainable Model Compression and
Performance Forecasting
When: 11:00 am Tuesday November 12th, 2024 |
Where: Room 3107 Patrick F. Taylor Hall |
ABSTRACT |
Machine learning has made tremendous progress in real-world tasks. However, the computational resources required limit model deployment in various scenarios. Faced with this issue, two solutions arise: Designing efficient neural network models and compressing existing ones. Neural Architecture Search (NAS) finds optimal neural networks from a search space. Performance predictors lower the NAS compute requirement but are limited by the constraints of search spaces and benchmark tasks. GENNAPE introduces a computational graph format that represents architectures from different search spaces, while AIO-P utilizes knowledge injection to expand the scope of predictors into real-world tasks. Further, AutoGO optimizes the primitive operation structure of existing architectures, maximizing the performance and hardware-friendliness in deployment scenarios. In contrast, model compression reduces the hardware costs of existing neural networks. We adapt performance predictors to compression by casting the set of all choices, e.g., quantization bit precision, as a search space. However, effective compression should identify which components are crucial to maintaining performance. Thus, we propose block profile sampling and AutoBuild, which identify the model design choices that contribute to performance and hardware-friendliness. This enables us to find pruned Stable Diffusion v1.4 variants. Further, Qua2SeDiMo identifies the individual weight layers that are sensitive to low-bit quantization, enabling us to construct sub 4-bit weight quantization schemes for many diffusion models. Finally, we discuss plans for obtaining neural networks that are optimally compressed at training time and accurate model performance forecasting. We root these predictions in knowledge of model architectures, compression sensitivity, hardware-friendliness and task statistics. |
Keith MillsUniversity of AlbertaKeith G. Mills is a Ph.D. Candidate at the Department of Electrical and Computer Engineering at the University of Alberta under the supervision of Professor Di Niu. He received his MSc. in Computer Engineering from the University of Alberta in 2020 and his BSc. in Computer Engineering, with Distinction, and also from the University of Alberta in 2018. He is the holder of the Alberta Innovates Graduate Student Scholarship and the William Boytzun Memorial Graduate Scholarship for 2024. His research interest is leveraging methods like Neural Architecture Search to explore the intersection of Efficient Machine Learning eXplainable AI (XAI). |