Skip to content

Commit 9db5eae

Browse files
authored
Merge pull request #284 from sayjeyhi/article/webllm-and-how-to-use-them
feat: add what is webllm article
2 parents 1708932 + 93b31f9 commit 9db5eae

File tree

7 files changed

+174
-5
lines changed

7 files changed

+174
-5
lines changed

data/authors/jafar-rezaei.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: Jafar Rezaei
33
avatar: /authors/jafar-rezaei.jpeg
4-
occupation: Senior Frontend Developer
4+
occupation: Fullstack Software Engineer
55
twitter: https://twitter.com/sayjeyhi
66
linkedin: https://www.linkedin.com/in/jafar-rezaei/
77
github: https://github.com/sayjeyhi

data/blog/what-is-webllm.md

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
title: 'What is WebLLM'
3+
date: '2025-02-21'
4+
tags: ['frontend', 'web-llm', 'llm', 'large-language-models']
5+
images: ['/articles/what-is-webllm/header.jpg']
6+
summary: WebLLM runs in browser to allow Models to be more accessible inside the browser and fully client side! With the evaluation of AI technologies, it is expected that WebLLM will be an important part of future applications.
7+
authors: ['jafar-rezaei']
8+
theme: 'rouge'
9+
---
10+
11+
## WebLLM: Running LLMs in the Browser
12+
13+
Large Language Models (LLMs) changed the play for natural language processing (NLP) as they help implement chatbots, code generations, etc easier.
14+
As we move forward the models are getting smaller and faster and this opens a lot of opportunities.
15+
Traditionally, LLMs had to be served from powerful cloud-based GPUs and powerful servers, but recent progress made it much easier to
16+
be even possible to run in a local client-side browser. In this article, we will cover how WebLLM can help this process of serving a model fully on the client side.
17+
18+
## What is WebLLM?
19+
20+
WebLLM is an approach implemented by [MLC-AI](https://github.com/mlc-ai) team that allows LLMs to run fully locally within a browser using
21+
[WebAssembly (WASM)](https://developer.mozilla.org/en-US/docs/WebAssembly), [WebGPU](https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API), and other modern web technologies.
22+
When WebLLM is used, it first downloads the chosen model and stores it locally in the [CacheStorage](https://developer.mozilla.org/en-US/docs/Web/API/CacheStorage). From that moment on, it can be used fully offline. While it might be a bit extra work to download and run a model locally, with the growth of internet speed and device capabilities, this might be a good idea in the near future.
23+
24+
25+
## How WebLLM Works
26+
The technology behind WebLLM allows it to work fully in the browser without any server-side infrastructure. but how does it work?
27+
28+
- **Use of WebAssembly (WASM) and WebGPU**: It Converts the model computations into a specific format, is efficient for WebAssembly modules, and uses WebGPU for acceleration to run the model.
29+
- **Running on the Client Side**: Unlike traditional cloud-based LLMs, WebLLM works fully locally and in the browser, so it does not send data to remote servers and ensures privacy and security.
30+
- **Optimized model Size**: WebLLM Uses a quantized and optimized version of models so it can be faster and more efficient in a browser environment.
31+
32+
## Cloud vs. In-Browser LLMs
33+
Comparing Native LLM embedding vs web-based LLM embedding might be a good way to understand the differences between the two. Of course it
34+
might not be ideal for all projects, but running LLMs in the browser is a great advantage for most projects.
35+
36+
| Feature | LLM served from Cloud | WebLLM (In-Browser) |
37+
|---------------------|--------------------------------------------|---------------------------------------------|
38+
| **Offline Support** | Limited (requires internet) | Can run offline once loaded |
39+
| **Performance** | Faster (dedicated hardware) | Slower (limited by browser capabilities) |
40+
| **Privacy** | Limited due to send/receive data to server | Fully private (runs locally) |
41+
| **Installation** | Requires specific servers (GPUs, Memory) | Open the website and download the model |
42+
| **Portability** | Limited to specific OS/hardware | Cross-platform (any modern browser) |
43+
| **Latency** | Lower latency (powerful hardware) | Higher latency (browser execution overhead) |
44+
45+
Keep in mind that using WebLLM will require the model to be downloaded and stored locally on the client.
46+
So it is important to consider the size of the model and the amount of data that needs to be downloaded.
47+
48+
## How Can I Implement It in My Website?
49+
WebLLM has npm package([@mlc-ai/web-llm](https://www.npmjs.com/package/@mlc-ai/web-llm)) to work with it, and it can be installed easily as an npm package:
50+
51+
```bash
52+
npm i @mlc-ai/web-llm
53+
```
54+
Then import the module in your code and use it.
55+
It can also be dynamically imported:
56+
57+
```javascript
58+
const webllm = await import ("https://esm.run/@mlc-ai/web-llm");
59+
```
60+
61+
### Create MLCEngine
62+
63+
Most operations in WebLLM are done through MLCEngine.
64+
here is a sample code snippet
65+
66+
Create your LLM engine
67+
```javascript
68+
import * as webllm from "@mlc-ai/web-llm";
69+
70+
const selectedModel = "Llama-3.1-8B-q4f32_1-MLC";
71+
72+
const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
73+
selectedModel,
74+
{
75+
initProgressCallback: (initProgress) => {
76+
console.log(initProgress);
77+
},
78+
logLevel: "INFO",
79+
},
80+
);
81+
```
82+
83+
As soon as you have the engine loaded, and ready to use, you can start calling the APIs.
84+
Note that the engine creation is asynchronous, so you need to wait until it finishes loading the model before you can use it.
85+
86+
```typescript
87+
const messages = [
88+
{ role: "system", content: "You are a helpful AI assistant." },
89+
{ role: "user", content: "Hello!" },
90+
]
91+
92+
const reply = await engine.chat.completions.create({
93+
messages,
94+
});
95+
console.log(reply.choices[0].message);
96+
console.log(reply.usage);
97+
```
98+
99+
It also supports streaming, and it can be done easily by passing `stream: true` property to the create method.
100+
101+
```javascript
102+
const chunks = await engine.chat.completions.create({
103+
messages,
104+
temperature: 1,
105+
stream: true, // <-- Enable streaming
106+
stream_options: { include_usage: true },
107+
});
108+
109+
let reply = "";
110+
for await (const chunk of chunks) {
111+
reply += chunk.choices[0]?.delta.content || "";
112+
console.log(reply);
113+
if (chunk.usage) {
114+
console.log(chunk.usage); // only last chunk has usage
115+
}
116+
}
117+
```
118+
119+
Since these operations are running on the same thread of the browser, it can be a bit tricky to run them in the applications, and
120+
it can harm the performance of the application.
121+
So instead of using it directly, it is recommended to use it in a separate thread(Worker).
122+
123+
In browser environment, we have [Web Workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers), and [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API).
124+
Fortunately, WebLLM provides a wrapper for these APIs, so you can use it in the same way as the native API.
125+
126+
```javascript
127+
import * as webllm from "@mlc-ai/web-llm";
128+
129+
const engine = new webllm.ServiceWorkerMLCEngine();
130+
await engine.reload("Llama-3-8B-Instruct-q4f16_1-MLC");
131+
132+
async function main() {
133+
const stream = await engine.chat.completions.create({
134+
messages: [{ role: "user", content: "Hello!" }],
135+
stream: true,
136+
});
137+
138+
for await (const chunk of stream) {
139+
updateUI(chunk.choices[0]?.delta?.content || ""); // TO UPDATE UI
140+
}
141+
}
142+
```
143+
144+
145+
If you have worked with the OpenAI library previously, the code structure might look similar to you, the reason is that they tried to keep it similar in a way that it feels like interacting with a server via sending and receiving JSON.
146+
147+
It also supports media formats like image URLs.
148+
Also, there are some examples of how to use WebLLM in different projects/frameworks on mlc-ai team repository for web-llm: [examples](https://github.com/mlc-ai/web-llm/tree/main/examples) folder.
149+
150+
151+
### Try it out
152+
153+
<figure class="flex flex-col items-center justify-center">
154+
<img alt="WebLLM in action" src="/articles/what-is-webllm/webllm-in-action.gif"/>
155+
<figcaption>WebLLM in action</figcaption>
156+
</figure>
157+
158+
The MLC-AI team has developed the website:
159+
https://chat.webllm.ai/ allows you to download and try a wide range of LLMs locally in the browser without any installation or configuration.
160+
It can give a quick overview of how webLLM works and how to use it.
161+
162+
163+

data/talks/3d-rendering-in-react.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: 3D Rendering in React
3+
summary: Three.js is a powerful JavaScript library for creating and displaying 3D graphics in a web browser. It provides a wide range of features for creating 3D scenes, including support for lighting, cameras, and animations. With React Three Fiber, developers can use Three.js within React components, allowing them to create dynamic and interactive 3D scenes in their applications.
4+
tags: ['frontend', 'threejs', 'react', 'webgl', '3d']
5+
authors: ['jafar-rezaei']
6+
slides: 'https://r3f-threejs.sayjeyhi.com/'
7+
---

package-lock.json

+3-4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
1.58 MB
Loading
Loading

public/authors/jafar-rezaei.jpeg

352 KB
Loading

0 commit comments

Comments
 (0)