0.6.0
This version changes the name of the main
application into dllama
. From now to run the root node or a worker you need to compile dllama
and run the dllama
application.
make dllama
./dllama inference --model ... --tokenizer ...
Also this version introduces an early stage HTTP api compatible with the OpenAI api (only the /v1/chat/completions
endpoint). How to run the api you can find here. A big shout out to @DifferentialityDevelopment for implementing this feature. #39