Skip to content

An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks

Notifications You must be signed in to change notification settings

UIC-InDeXLab/RSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks

RSR logo

This repository implements Redundant Segment Reduction (RSR), a fast matrix multiplication algorithm designed for matrices in binary and ternary networks. The RSR method optimizes computation efficiency by a log(n) factor, making it particularly useful for applications in low-bit deep learning and efficient inference.

The repository includes:

  • A native C++ implementation of the RSR method for performance comparison.
  • NumPy-based implementations for ease of experimentation and integration into Python workflows.
  • PyTorch implementations with both CPU and GPU support, enabling scalable and optimized matrix operations in deep learning environments.

This project aims to provide a fast and efficient approach to low-bit matrix multiplication.

【🧠 Large Languange Models | 🧮 NumPy | 🔥 PyTorch | 💻 C++

visualization

A visualiazation of the algorithm.


🧮 NumPy Implementations

The NumPy implementations of the matrix multipliers (Naive, RSR, and RSR++) are found in numpy_impl directory. You can use these multipliers by instantiating a Multiplier object and passing a weight matrix A (required) and an optional parameter k. Initialization automatically includes any necessary preprocessing steps, and you can perform inference on input vectors using the multiply method.

⚙️ Requirements

Ensure you have Python >= 3.6 installed, along with all packages listed in requirements.txt.

✅ Testing the Multipliers

To validate the correctness of the RSR and RSR++ multipliers, run rsr_test.py. This script randomly generates a weight matrix and an input vector, then compares the results of the multiplication with the ground truth. To run the tests use ./run_test.sh inside numpy_impl directory.


💻 Native C++ Implementations

Native C++ implementations for the matrix multipliers are available in the native directory.

⚙️ Requirements

To compile and run the C++ code, you’ll need clang++ installed.

⏱️ Run Time Comparison

To compare run times for different values of n across algorithms, use the script ./run_time_compare.sh [algorithm], where [algorithm] can be one of naive, rsr, or rsrpp.

🔧 k Optimization

To test various values of k for runtime optimization, run ./run_k_optimization.sh. This script benchmarks the run times for different k values, with the target n value specified in k_optimization.cpp.

🧪 Running Tests

Several tests are provided to ensure algorithmic correctness. Run these tests by executing ./run_test.sh inside native directory.


🔥 Torch Implementations

Torch implementations are inside torch_impl directory. This implementation works only with torch tensor operations. To run the tests and examples run ./run_tests.sh inside torch_impl directory.


🧠 LLM Experiments

The LLM implementations are inside llm directory, which contains both codes for the inference on CPU and GPU in the corresponding dirs. The patch_[cpu/gpu].py file contains patched functions for preprocessing and the new forward function for the BitLinear module.

  • The notebook [cpu/gpu]_impl/profile.ipynb contains an example code to apply patch and profile the running time.
  • Currently, this example works with this 1.58bit models: Llama3-8B-1.58bit, Falcon3-10B-1.58bit, and Falcon3-3B-1.58bit.
  • The notebook gpu_impl/matrix_mult_compare.ipynb contains the code for comparing the pure matrix multiplication of torch and RSR.
  • The result of experiments on GPU is shown in the paper, Experiments section.

About

An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •