Skip to content

Commit

Permalink
Added gif and switch to Llama 3.1
Browse files Browse the repository at this point in the history
  • Loading branch information
williamw committed Feb 28, 2025
1 parent ea86250 commit 8bd2b44
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 8 deletions.
18 changes: 11 additions & 7 deletions max-serve-anythingllm/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Use AnythingLLM and DeepSeek R1 with MAX Serve
# Use AnythingLLM with MAX Serve

Building on the solid foundation MAX provides, adding a robust user interface is a natural next step.

![AnythingLLM with MAX Serve Demo](demo.gif)

In this recipe you will:

- Use MAX Serve to provide an OpenAI-compatible endpoint for [DeepSeek R1](https://api-docs.deepseek.com/news/news250120)
- Use MAX Serve to provide an OpenAI-compatible endpoint for [Llama 3.1](https://ai.meta.com/blog/meta-llama-3-1/)
- Set up [AnythingLLM](https://github.com/Mintplex-Labs/anything-llm) to provide a robust chat interface
- Learn how to orchestrate multiple services in pure Python, without tools like Kubernetes or docker-compose

Expand Down Expand Up @@ -88,18 +90,20 @@ The first time you [launch AnythingLLM in your browser](http://localhost:3001),
1. Select *Generic OpenAI* as the LLM provider, then enter:
- Base URL = `http://host.docker.internal:3002/v1`
- API Key = `local` (MAX doesn't require an API key, but this field can't be blank)
- Chat Model Name = `deepseek-ai/DeepSeek-R1-Distill-Llama-8B`
- Chat Model Name = `modularai/Llama-3.1-8B-Instruct-GGUF`
- Token Context Window = `16384` (Must match `MAX_CONTEXT_LENTH` from `pyproject.toml`)
- Max Tokens = `1024`
2. Next, for User Setup, choose *Just me* or *My team*, and set an admin password.
3. If asked to fill in a survey, you may participate or skip this step. (The survey data goes to the AnythingLLM project, not Modular.)
4. Finally, enter a workspace name.

Note: Don't let the `modularai` in the Chat Model Name field limit you. MAX supports any PyTorch model on Hugging Face, and includes special acceleration for the most common architectures. Modular simply hosts weights for Llama 3.1 to get you up and running quickly. (Access to the official [meta-llama repo](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) is gated and requires waiting for approval.)

## Understand the project

Let's explore how the key components of this recipe work together.

### Configuration with `pyproject.toml`
### Configuration: pyproject.toml

The recipe is configured in the `pyproject.toml` file, which defines:

Expand All @@ -116,15 +120,15 @@ The recipe is configured in the `pyproject.toml` file, which defines:
- `app`: Runs the main Python script that coordinates both services
- `setup`: Sets up persistent storage for AnythingLLM
- `ui`: Launches the AnythingLLM Docker container
- `llm`: Starts MAX Serve with DeepSeek R1
- `llm`: Starts MAX Serve with Llama 3.1
- `clean`: Cleans up network resources for both services

3. **Dependencies** for running both services:
- MAX Serve runs via the `max-pipelines` CLI
- AnythingLLM runs in a Docker container, keeping its dependencies isolated
- Additional dependencies to orchestrate both services

### Setup with `setup.py`
### Initial Setup: setup.py

The `setup.py` script handles the initial setup for AnythingLLM:

Expand All @@ -133,7 +137,7 @@ The `setup.py` script handles the initial setup for AnythingLLM:
- Ensures an empty `.env` file is present for AnythingLLM settings
- This script is automatically run as a pre-task when you execute `magic run app`

### Orchestration with `main.py`
### Orchestration: main.py

When you run `magic run app`, the `main.py` script coordinates everything necessary to start and shutdown both services:

Expand Down
Binary file added max-serve-anythingllm/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion max-serve-anythingllm/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ UI_CONTAINER_NAME = "anythingllm-max"
[tool.pixi.tasks]
app = "python main.py llm ui --pre setup --post clean"
setup = "python setup.py"
llm = "max-pipelines serve --max-length=$MAX_CONTEXT_LENGTH --max-batch-size=$MAX_BATCH_SIZE --model-path=deepseek-ai/DeepSeek-R1-Distill-Llama-8B --weight-path=lmstudio-community/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf"
llm = "max-pipelines serve --max-length=$MAX_CONTEXT_LENGTH --max-batch-size=$MAX_BATCH_SIZE --model-path=modularai/Llama-3.1-8B-Instruct-GGUF"
ui = "docker run -p $UI_PORT:3001 --name $UI_CONTAINER_NAME --cap-add SYS_ADMIN -v $UI_STORAGE_LOCATION:/app/server/storage -v $UI_STORAGE_LOCATION/.env:/app/server/.env -e STORAGE_DIR=\"/app/server/storage\" mintplexlabs/anythingllm"
clean = "pkill -f \"max-pipelines serve\" || true && lsof -ti:$MAX_SERVE_PORT,$UI_PORT | xargs -r kill -9 2>/dev/null || true && docker rm -f $UI_CONTAINER_NAME 2>/dev/null || true"

Expand Down

0 comments on commit 8bd2b44

Please sign in to comment.