Departments |
Computer Science
|
Description | This project will explore techniques to compress the layers of large language models (LLMs) while preserving performance. By leveraging methods like low-rank decomposition, quantization, and pruning, the goal is to reduce model size and computational demands without significantly affecting accuracy. The project involves benchmarking different compression strategies on inference speed, memory usage, and output quality. Potential applications include deploying LLMs on resource-constrained devices or optimizing cloud-based inference for cost efficiency.
This project requires significant computational power, so using a capable computer with an efficient graphics card is recommended. Our lab can provide access to specialized GPU hardware if needed.. |
Preparation | Please read:
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
and have a look at this very nice website:
https://bbycroft.net/llm |
Project Categories |
Artificial Intelligence (AI), Data Science |
Project Keywords |
Machine Learning, Neural Networks, Optimisation |
Level of Studies
|
Level 6 (Undergraduate Year 3) |
yes
|
Level 7 (Masters) |
yes
|
Level 8 (PhD) |
yes
|