An extension of the Llama2.java implementation, accelerated with GPUs by using TornadoVM and Level Zero JNI (GPUs)
This repo extends https://github.com/mikepapadim/llama2.tornadovm.java and llama2.java with Level Zero JNI Support to run on GPUs.
This project has been checked with Intel HD Graphics (integrated GPUs) and Intel ARC (discrete GPUs).
![](https://private-user-images.githubusercontent.com/8652854/327245414-4493fe14-7427-4532-91fa-7299cd96034b.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NjkzMjcsIm5iZiI6MTczOTQ2OTAyNywicGF0aCI6Ii84NjUyODU0LzMyNzI0NTQxNC00NDkzZmUxNC03NDI3LTQ1MzItOTFmYS03Mjk5Y2Q5NjAzNGIuanBnP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTNUMTc1MDI3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Nzc2YjEwZWRiMTAzYjQxMjQ1ODdkZGY3M2U0ZTI0ZDkxN2VmZjk4ZTUyZjhhMmY1YWYyYjI4MTkwMTkxMDg2NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.EcqYL5_UTqhg7X-zuVIjG0QTkLTB7vYx22C6Kxzo_9o)
- JDK 21+: This is essential as the project uses the Project Panama for native memory allocation.
- TornadoVM: Detailed installation instructions can be found here.
First, build TornadoVM with the Level Zero Backend:
cd tornadovm
./bin/tornadovm-installer --jdk jdk21 --backend=spirv
Then, copy the setvars.sh
into the local folder for the llama2.tornadovm and level zero
cp <tornadovm>/setvars.sh .
source setvars.sh
And finally, build this project with maven:
mvn clean package
Just like the original Java implementation, the program requires a tokenizer.bin
file and the input models available in the TinyLlamas.
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
The repository contains a run.sh
script for running. This script takes the following arguments:
- Version to run (
java
,levelzero
,tornadovm
) - Device index to run (
java
,levelzero
,tornadovm
) - The .bin model file
// Run with just the model with LevelZero
./run.sh -v levelzero stories15M.bin
// Run in pure Java, without TornadoVM
./run.sh -v java stories15M.bin
// Run with TornadoVM
./run.sh -v tornadovm stories15M.bin
## Change device
// Run with just the model with LevelZero and Device 1
./run.sh -v levelzero -d 1 stories15M.bin
// Run with TornadoVM and device 1
./run.sh -v tornadovm -d 1 stories15M.bin
MIT