Just a small toy gRPC service to interact with LLama or any other AI Model exposed through the kalosm library or through ollama by using the binding provided by ollama-rs
The cargo.toml specified metal for osx performance improvement.
By default couscous will uses Kalosm to run the Llama model. If you wish to use ollama please run the project by running the following command
cargo run --features ollama
You can set environment varialbles through an .env file. Below is a configuration example (external model is only support for Ollama)
OLLAMA_HOST="https://{pod-id}-11434.proxy.runpod.net/"
OLLAMA_PORT="443"
GRPC_SERVER_ADDRESS="127.0.0.1:50051"
In order to use ollama, please download ollama and download the model llama3.1
The API allows to create multiple chats. You can create a chat by querying the grpc endpoint
grpcurl -plaintext 127.0.0.1:50051 couscous.Couscous/NewChannel
The response would be
{
"id": "4f17ed18-b34d-43bf-b865-1db64ab926b9"
}
grpcurl -plaintext -d '{"chat_id": "25af8332-a15c-4962-abdb-924dce5c4a0d", "message": "Hello how are you today ?"}' 127.0.0.1:50051 couscous.Couscous/Discuss
The response would be
{
"message": "Hello! I'm just an AI, so I don't have feelings like humans do, but thank you for asking! *smiles* It's nice to chat with you. How about you? Is there something on your mind that you'd like to talk about or ask me?"
}
As the library allows to load the history. All chat will be saved and restored when relaunching the couscous binary.
Note
If you decide to switch from Kalosm to Ollama or the other way. You'll need to remove the cache. You can do that by deleting the file chats.json
A minimal ui is available if you want to interact through your browser instead of using postman or grpcurl. You can run the UI by going to the ui
folder and run the command
npm run dev
Warning
You must run the server first as it'll create a new channel automatically