Skip to content

Understand pruning and fine-tuning of LLMs through different pipelines

Notifications You must be signed in to change notification settings

kangchengX/Prune-Finetune-LLM

Repository files navigation

PRUNE-FINETUNE-LLM

Refining Intelligence, Optimizing Performance Seamlessly

Developed with the software and tools below.

PyTorch Python


Table of Contents

Overview

Large Language Models (LLMs) have proven to be remarkably accurate and effective for several tasks such as summarisation, language translation, question answering and many others. However, to expand their capabilities and performance, these models have progressively increased in size. This growth has prompted research in two key areas: model compression and fine-tuning. Through compression techniques like pruning, redundant parameters and connections are trimmed, decreasing both memory usage and inference time. Fine-tuning then tailors the model's parameters to excel in designated domains or tasks, leveraging pre-trained natural language knowledge. This synergy optimises efficiency minimising impact on performance, addressing challenges of computational demands and task-specific proficiency. We seek to find the optimal ordering of this synergy, reporting our results on well-known LLM benchmarks. This study discusses a methodology for model compression and performance regeneration via Wanda pruning and LoRA fine-tuning. We investigate and quantify the impact on performance based on the ordering of pruning and fine-tuning for a compressed model on task-specific metrics, showing that 'Order Matters'.

Pipelines

pipelines pipelines

  • prune: only prune the model for one time.
  • prune_finetune: fine-tune -> prune (the left pipeline in the left figure with L = 1).
  • finetune_prune: prune -> fine-tune (the right pipeline in the left figure with L = 1)..
  • iter_pf: (prune -> fine-tune) x L (the left pipeline in the right figure).
  • iter_fp: (fine-tune -> prune) x L (the right pipeline in the right figure).

Results

bbh belebele
mmlu factoid_qa

Repository Structure

└── Prune-Finetune-LLM/
    ├── wanda
    ├── factoid_qa
    │   ├── __init__.py
    │   ├── freebase_qa.py
    │   └── FreebaseQA-eval.json
    ├── plots
    │   ├── [plot1].png
    │   ├── [plot2].png
    │   └── ...
    ├── main.py
    ├── eval.py
    ├── process.py
    ├── utils.py
    ├── constant.py
    ├── experiments.py
    ├── run.sh
    ├── plot_comparison_weights.py
    ├── plots_comparison_metrics.py
    ├── README.md
    └── requirements.txt

Modules

.
File Summary
main.py Coordinates model operations, specifically managing the pruning, fine-tuning, and assessment of a large language model (LLM). This servers as the main interface of executing model operation.
eval.py Evaluates large language models (LLMs) across various datasets and metrics.
process.py Defines pruning and fine-tuning of the model.
utils.py Serves as a utility module providing functions for model layer identification, language model response generation, response parsing, model loading, response validation, and results management.
constant.py Defines critical pathways for various pipeline stages in the repository, and the path of python interpreter
experiments.py Contains a series of different experiments, allowing for combinations of pruning and fine-tuning operations on pre-trained models under different pipelines, by executing main.py.
plot_comparison_weights.py Visualizes statistical distributions and differences of weights of LLMs, in order to compare different pipelines.
plots_comparison_metrics.py Generates comparative visualizations of performance metrics across different pruning and fine-tuning pipelines.
requirements.txt Depandencies for this repo.
run.sh Executes multiple experiment pipelines, leveraging the experiments.py script.
wanda This directory contains wanda pruning method, based on https://github.com/locuslab/wanda .
factoid_qa This directory contains factoid qa metric, which is a accuracy assessing model's ablity to store factual knowledge, based on https://github.com/kelvin-jiang/FreebaseQA .
plots This directory visualization results.

Getting Started

System Requirements:

  • Python: version 3.10.12

Installation

From source

  1. Clone the Prune-Finetune-LLM repository and submodules:
$ git clone --recurse-submodules https://github.com/kangchengX/Prune-Finetune-LLM.git
  1. Install venv
$ apt install python3-venv
  1. Create virtual environment
$ python -m venv venv
  1. Activate the virtual environement
$ source venv/bin/activate
  1. Change to the project directory:
$ cd Prune-Finetune-LLM
  1. Install the dependencies:
$ pip install -r requirements.txt

Acknowledgments

Thanks all 5 members of our team.

Alvaro, Fernandez; Aung, Htet; Carlos, Diez; Filippo, Fiocchi; Xu, Kangcheng

Return


About

Understand pruning and fine-tuning of LLMs through different pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published