Skip to content

Commit 9e73dbb

Browse files
authored
Merge pull request #215 from moxin-org/dev
Main update 12 Aug
2 parents 6342a9d + 245d104 commit 9e73dbb

34 files changed

+1027
-593
lines changed

Cargo.lock

+28-7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ moxin-fake-backend = { path = "moxin-fake-backend" }
3030

3131
makepad-widgets = { git = "https://github.com/jmbejar/makepad", branch = "moxin-release-v1" }
3232

33-
robius-open = "0.1.0"
33+
robius-open = "0.1.1"
3434
robius-url-handler = { git = "https://github.com/project-robius/robius-url-handler" }
3535

3636
chrono = "0.4"

README.md

+27-9
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The following table shows which host systems can currently be used to build Moxi
99
| ------- | --------------- | ------- | ----- | -------------------------------------------- |
1010
| macOS | macOS ||| `.app`, [`.dmg`] |
1111
| Linux | Linux ||| [`.deb` (Debian dpkg)], [AppImage], [pacman] |
12+
| Windows | Windows (10+) ||| `.exe` (NSIS) |
1213

1314
## Building and Running
1415

@@ -41,6 +42,9 @@ curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/insta
4142
source $HOME/.wasmedge/env
4243
```
4344

45+
> [!IMPORTANT]
46+
> If your CPU does not support AVX512, then you should append the `--noavx` option onto the above command.
47+
4448
To build Moxin on Linux, you must install the following dependencies:
4549
`openssl`, `clang`/`libclang`, `binfmt`, `Xcursor`/`X11`, `asound`/`pulse`.
4650

@@ -64,6 +68,15 @@ cargo run --release
6468
6569
2. Restart your PC, or log out and log back in, which allows the LLVM path to be properly
6670
* Alternatively you can add the LLVM path `C:\Program Files\LLVM\bin` to your system PATH.
71+
72+
73+
> [!TIP]
74+
> To automatically handle Steps 3 and 4, simply run:
75+
> ```sh
76+
> cargo run -p moxin-runner -- --install
77+
> ```
78+
79+
6780
3. Download the [WasmEdge-0.14.0-windows.zip](https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-0.14.0-windows.zip) file from [the WasmEdge v0.14.0 release page](https://github.com/WasmEdge/WasmEdge/releases/tag/0.14.0),
6881
and then extract it into a directory of your choice.
6982
We recommend using your home directory (e.g., `C:\Users\<USERNAME>\`), represented by `$home` in powershell and `%homedrive%%homepath%` in batch-cmd.
@@ -78,18 +91,23 @@ cargo run --release
7891
$ProgressPreference = 'Continue' ## restore default progress bars
7992
```
8093
81-
4. Download the WasmEdge WASI-NN plugin here: [WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip](https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip) (15.5MB) and extract it to the same directory as above, e.g., `C:\Users\<USERNAME>\WasmEdge-0.14.0-Windows`.
94+
4. Download [the appropriate WasmEdge WASI-NN plugin](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/tag/b3499) (see below for details), extract/unzip it, and copy the `lib\wasmedge` directory from the .zip archive into the `lib\` directory of the above WasmEdge installation directory, e.g., `C:\Users\<USERNAME>\WasmEdge-0.14.0-Windows\lib`.
95+
8296
> [!IMPORTANT]
83-
> You will be asked whether you want to replace the files that already exist; select `Replace the files in the destination` when doing so.
84-
* To do this quickly in powershell:
85-
```powershell
86-
$ProgressPreference = 'SilentlyContinue' ## makes downloads much faster
87-
Invoke-WebRequest -Uri "https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip" -OutFile "WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip"
88-
Expand-Archive -Force -LiteralPath "WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip" -DestinationPath "$home\WasmEdge-0.14.0-Windows"
89-
$ProgressPreference = 'Continue' ## restore default progress bars
90-
```
97+
> The only file that matters is the plugin file, which must exist at the path `WasmEdge-0.14.0-Windows\lib\wasmedge\wasmedgePluginWasiNN.dll`
9198
99+
* If your computer has a CUDA v12-capable GPU, select [WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/download/b3499/WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.0-windows_x86_64.zip).
100+
* Note that **CUDA version 12** is required.
101+
* If your computer doesn't have CUDA 12, then select either:
102+
* [WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/download/b3499/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip) if your CPU supports AVX-512, or
103+
* [WasmEdge-plugin-wasi_nn-ggml-noavx-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/tag/b3499#:~:text=WasmEdge%2Dplugin%2Dwasi_nn%2Dggml%2Dnoavx%2D0.14.0%2Dwindows_x86_64.zip) if your CPU does *not* support AVX-512.
104+
105+
92106
5. Set the `WASMEDGE_DIR` and `WASMEDGE_PLUGIN_PATH` environment variables to point to the `WasmEdge-0.14.0-Windows` directory that you extracted above, and then build Moxin.
107+
108+
> [!IMPORTANT]
109+
> You may also need to add the `WasmEdge-0.14.0-Windows\bin` directory to your `PATH` environment variable (on some versions of Windows).
110+
93111
In powershell, you can do this like so:
94112
```powershell
95113
$env:WASMEDGE_DIR="$home\WasmEdge-0.14.0-Windows\"

moxin-backend/src/backend_impls/api_server.rs

+27-8
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ static WASM: &[u8] = include_bytes!("../../wasm/llama-api-server.wasm");
2323
pub struct LLamaEdgeApiServer {
2424
id: String,
2525
listen_addr: SocketAddr,
26+
load_model_options: LoadModelOptions,
2627
wasm_module: Module,
2728
running_controller: tokio::sync::broadcast::Sender<()>,
2829
#[allow(dead_code)]
@@ -35,15 +36,23 @@ fn create_wasi(
3536
load_model: &LoadModelOptions,
3637
) -> wasmedge_sdk::WasmEdgeResult<WasiModule> {
3738
// use model metadata context size
38-
let ctx_size = Some(format!("{}", file.context_size.min(8 * 1024)));
39+
let ctx_size = if let Some(n_ctx) = load_model.n_ctx {
40+
Some(format!("{}", n_ctx))
41+
} else {
42+
Some(format!("{}", file.context_size.min(8 * 1024)))
43+
};
3944

4045
let n_gpu_layers = match load_model.gpu_layers {
4146
moxin_protocol::protocol::GPULayers::Specific(n) => Some(n.to_string()),
4247
moxin_protocol::protocol::GPULayers::Max => None,
4348
};
4449

4550
// Set n_batch to a fixed value of 128.
46-
let batch_size = Some(format!("128"));
51+
let batch_size = if let Some(n_batch) = load_model.n_batch {
52+
Some(format!("{}", n_batch))
53+
} else {
54+
Some("128".to_string())
55+
};
4756

4857
let mut prompt_template = load_model.prompt_template.clone();
4958
if prompt_template.is_none() && !file.prompt_template.is_empty() {
@@ -133,17 +142,23 @@ impl BackendModel for LLamaEdgeApiServer {
133142
options: moxin_protocol::protocol::LoadModelOptions,
134143
tx: std::sync::mpsc::Sender<anyhow::Result<moxin_protocol::protocol::LoadModelResponse>>,
135144
) -> Self {
145+
let load_model_options = options.clone();
136146
let mut need_reload = true;
137147
let (wasm_module, listen_addr) = if let Some(old_model) = &old_model {
138-
if old_model.id == file.id.as_str() {
148+
if old_model.id == file.id.as_str()
149+
&& old_model.load_model_options.n_ctx == options.n_ctx
150+
&& old_model.load_model_options.n_batch == options.n_batch
151+
{
139152
need_reload = false;
140153
}
141154
(old_model.wasm_module.clone(), old_model.listen_addr)
142155
} else {
143-
(
144-
Module::from_bytes(None, WASM).unwrap(),
145-
([0, 0, 0, 0], 8080).into(),
146-
)
156+
let new_addr = std::net::TcpListener::bind("localhost:0")
157+
.unwrap()
158+
.local_addr()
159+
.unwrap();
160+
161+
(Module::from_bytes(None, WASM).unwrap(), new_addr)
147162
};
148163

149164
if !need_reload {
@@ -152,6 +167,7 @@ impl BackendModel for LLamaEdgeApiServer {
152167
file_id: file.id.to_string(),
153168
model_id: file.model_id,
154169
information: "".to_string(),
170+
listen_port: listen_addr.port(),
155171
},
156172
)));
157173
return old_model.unwrap();
@@ -165,7 +181,8 @@ impl BackendModel for LLamaEdgeApiServer {
165181

166182
let file_id = file.id.to_string();
167183

168-
let url = format!("http://localhost:{}/echo", listen_addr.port());
184+
let listen_port = listen_addr.port();
185+
let url = format!("http://localhost:{}/echo", listen_port);
169186

170187
let file_ = file.clone();
171188

@@ -197,6 +214,7 @@ impl BackendModel for LLamaEdgeApiServer {
197214
file_id: file_.id.to_string(),
198215
model_id: file_.model_id,
199216
information: "".to_string(),
217+
listen_port,
200218
},
201219
)));
202220
} else {
@@ -212,6 +230,7 @@ impl BackendModel for LLamaEdgeApiServer {
212230
listen_addr,
213231
running_controller,
214232
model_thread,
233+
load_model_options,
215234
};
216235

217236
new_model

moxin-backend/src/backend_impls/chat_ui.rs

+2
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,7 @@ fn get_input(
228228
file_id,
229229
model_id,
230230
information: String::new(),
231+
listen_port: 0,
231232
})));
232233
}
233234

@@ -430,6 +431,7 @@ impl super::BackendModel for ChatBotModel {
430431
file_id: file.id.to_string(),
431432
model_id: file.model_id,
432433
information: "".to_string(),
434+
listen_port: 0,
433435
})));
434436
return old_model.unwrap();
435437
}

moxin-backend/src/backend_impls/mod.rs

+4
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,8 @@ fn test_chat() {
139139
rope_freq_scale: 0.0,
140140
rope_freq_base: 0.0,
141141
context_overflow_policy: moxin_protocol::protocol::ContextOverflowPolicy::StopAtLimit,
142+
n_batch: Some(128),
143+
n_ctx: Some(1024),
142144
},
143145
tx,
144146
);
@@ -209,6 +211,8 @@ fn test_chat_stop() {
209211
prompt_template: None,
210212
gpu_layers: moxin_protocol::protocol::GPULayers::Max,
211213
use_mlock: false,
214+
n_batch: Some(128),
215+
n_ctx: Some(1024),
212216
rope_freq_scale: 0.0,
213217
rope_freq_base: 0.0,
214218
context_overflow_policy: moxin_protocol::protocol::ContextOverflowPolicy::StopAtLimit,

moxin-protocol/src/open_ai.rs

+1
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ pub struct ChatResponseData {
106106
pub choices: Vec<ChoiceData>,
107107
pub created: u32,
108108
pub model: ModelID,
109+
#[serde(default)]
109110
pub system_fingerprint: String,
110111
pub usage: UsageData,
111112

moxin-protocol/src/protocol.rs

+6-1
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,10 @@ pub struct LoadModelOptions {
2828
pub prompt_template: Option<String>,
2929
pub gpu_layers: GPULayers,
3030
pub use_mlock: bool,
31+
pub n_batch: Option<u32>,
32+
pub n_ctx: Option<u32>,
3133
pub rope_freq_scale: f32,
3234
pub rope_freq_base: f32,
33-
3435
// TBD Not really sure if this is something backend manages or if it is matter of
3536
// the client (if it is done by tweaking the JSON payload for the chat completition)
3637
pub context_overflow_policy: ContextOverflowPolicy,
@@ -41,6 +42,10 @@ pub struct LoadedModelInfo {
4142
pub file_id: FileID,
4243
pub model_id: ModelID,
4344

45+
// The port where the local server is listening for the model.
46+
// if 0, the server is not running.
47+
pub listen_port: u16,
48+
4449
// JSON formatted string with the model information. See "Model Inspector" in LMStudio.
4550
pub information: String,
4651
}

0 commit comments

Comments
 (0)