Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
imoneoi authored Dec 12, 2023
1 parent 85466f5 commit 46c5d6b
Showing 1 changed file with 63 additions and 44 deletions.
107 changes: 63 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,15 @@

# ✨ News

- [2023/12/10] We released the first update of [OpenChat 3.5](openchat/openchat-3.5-1210), improved coding by 15 points 🚀.

- [2023/11/01] We released the [OpenChat-3.5-7B](https://huggingface.co/openchat/openchat_3.5) model, surpassing ChatGPT on various benchmarks 🔥.

- [2023/09/21] We released our paper [OpenChat: Advancing Open-source Language Models with Mixed-Quality Data](https://arxiv.org/pdf/2309.11235.pdf).


<details>
<summary>Read more</summary>

- [2023/09/03] We released the [OpenChat V3.2 SUPER]([#models](https://huggingface.co/openchat/openchat_v3.2_super)) model.

- [2023/08/04] We have launched an [Online Demo](https://openchat.team) featuring the latest version, OpenChat 3.2.
Expand All @@ -42,20 +47,22 @@
- [2023/07/07] We released the [OpenChat V2 model series](#legacy-models).

- [2023/07/01] We released the [OpenChat V1 model series](#legacy-models).
</details>

# 🏷️ Benchmarks

| Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K |
|--------------------|----------|----------|--------------|----------|----------|---------------|--------------|-----------------|-------------|--------------|
| OpenChat-3.5 | **7B** | **61.6** | 7.81 | **47.4** | **47.6** | **59.1** | 64.3 | **55.5** | 63.5 | **77.3** |
| ChatGPT (March)* | ? | 61.5 | **7.94** | 47.1 | **47.6** | 57.7 | **67.3** | 48.1 | **70.1** | 74.9 |
| | | | | | | | | | | |
| OpenHermes 2.5 | 7B | 59.3 | 7.54 | 46.5 | 49.4 | 57.5 | 63.8 | 48.2 | 59.9 | 73.5 |
| OpenOrca Mistral | 7B | 52.7 | 6.86 | 42.9 | 49.4 | 45.9 | 59.3 | 38.4 | 58.1 | 59.1 |
| Zephyr-β^ | 7B | 34.6 | 7.34 | 39.0 | 40.6 | 40.8 | 39.8 | 22.0 | 16.0 | 5.1 |
| Mistral** | 7B | - | 6.84 | 38.0 | 39.0 | - | 60.1 | 30.5 | - | 52.2 |
| Open-source SOTA** | 13B-70B | 61.4 | 7.71 | 41.7 | 49.7 | 62.3 | 63.7 | 73.2 | 41.4 | 82.3 |
| | | | WizardLM 70B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | WizardCoder 34B | Flan-T5 11B | MetaMath 70B |
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|--------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
| OpenChat-3.5-1210 | **7B** | **63.8** | 7.76 | **68.9** | **49.5** | **48.0** | **61.8** | 65.3 | **77.3** | 61.8 |
| OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | **77.3** | 63.5 |
| ChatGPT (March)* | ? | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
| | | | | | | | | | | |
| OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
| OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
| Zephyr-β^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
| Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
| Open-source SOTA** | 13B-70B | 61.4 | 7.71 | 73.2 | 49.7 | 41.7 | 62.3 | 63.7 | 82.3 | 41.4 |
| | | | WizardLM 70B | WizardCoder 34B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | MetaMath 70B | Flan-T5 11B |

<details>
<summary>Evaluation details</summary>
Expand All @@ -71,28 +78,29 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
<details>
<summary>Reproducing benchmarks</summary>

Reasoning:
Reasoning and Coding:

Note: Please run the following commands at the base directory of this repository.

```bash
python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat_3.5
python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat-3.5-1210 --eval_sets coding fs_cothub/bbh zs/agieval zs/bbh_mc_orca zs/truthfulqa_orca
python ochat/evaluation/view_results.py
python ochat/evaluation/convert_to_evalplus.py
```

HumanEval:

Note: Please run the following commands at the base directory of this repository.
Then all humaneval code samples are placed in `ochat/evaluation/evalplus_codegen`. Use the following command to evaluate an individual code sample named `samples.jsonl` using Docker as a sandbox.

```bash
python -m ochat.evaluation.run_eval --condition "Code" --eval_sets coding --model openchat/openchat_3.5
python ochat/evaluation/convert_to_evalplus.py
docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples samples.jsonl
```

Then all humaneval code samples are placed in `ochat/evaluation/evalplus_codegen`. Use the following command to evaluate an individual code sample named `samples.jsonl` using Docker as a sandbox.
Mathematical Reasoning:

Note: Please run the following commands at the base directory of this repository.

```bash
docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples samples.jsonl
python -m ochat.evaluation.run_eval --condition "Math Correct" --model openchat/openchat-3.5-1210 --eval_sets fs_cothub/gsm8k zs/math
python ochat/evaluation/view_results.py
```

MT-Bench:
Expand All @@ -103,19 +111,20 @@ Note: Due to non-zero temperature and GPT-4 API changes over time, there might b

```bash
cd fastchat/llm_judge
python gen_api_answer.py --model openchat_3.5 --max-tokens 4096 --parallel 128 --openai-api-base http://localhost:18888/v1
python gen_judgment.py --model-list openchat_3.5 --parallel 8 --mode single
python gen_api_answer.py --model openchat_3.5_1210 --max-tokens 4096 --parallel 128 --openai-api-base http://localhost:18888/v1
python gen_judgment.py --model-list openchat_3.5_1210 --parallel 8 --mode single
```

</details>

## 🎇 Comparison with [X.AI Grok](https://x.ai/)

| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|--------------|-------------|---------|----------|------|-----------|----------|----------|
| OpenChat 3.5 | Apache-2.0 | 7B | **56.4** | 64.3 | 55.5 | **28.6** | **77.3** |
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
| Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|-------------------|-------------|---------|----------|------|-----------|----------|----------|
| OpenChat 3.5 1210 | Apache-2.0 | **7B** | **60.1** | 65.3 | **68.9** | **28.9** | **77.3** |
| OpenChat 3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | **77.3** |
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
| Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 |

# ⬇️ Installation
> [!NOTE]
Expand Down Expand Up @@ -179,19 +188,19 @@ pip3 install -e .
### For a single GPU (e.g. RTX 3090, 4090)

```bash
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5
python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210
```

### For multiple GPUs (tensor parallel)

```bash
# N is the number of tensor parallel GPUs
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray --tensor-parallel-size N
python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210 --engine-use-ray --worker-use-ray --tensor-parallel-size N
```

use `-h` to see more settings
```bash
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 -h
python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210 -h
```

<details>
Expand All @@ -205,6 +214,8 @@ If you want to deploy the server as an online service, you can use `--api-keys s

Once started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat).

💡 **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks

```bash
curl http://localhost:18888/v1/chat/completions \
-H "Content-Type: application/json" \
Expand All @@ -214,20 +225,18 @@ curl http://localhost:18888/v1/chat/completions \
}'
```

### Coding Mode
🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems

```bash
curl http://localhost:18888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openchat_3.5",
"condition": "Code",
"messages": [{"role": "user", "content": "Write an aesthetic TODO app using HTML5 and JS, in a single file. You should use round corners and gradients to make it more aesthetic."}]
"condition": "Math Correct",
"messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
}'
```

</details>

# <a id="web-ui"></a> 🌐 Web UI - [OpenChat-UI](https://github.com/imoneoi/openchat-ui)

After launching the API server, OpenChat provide user interface that easy to interact with. [Click here to check Web UI](https://github.com/imoneoi/openchat-ui)
Expand All @@ -237,18 +246,22 @@ After launching the API server, OpenChat provide user interface that easy to int
> [!WARNING]
> It's recommended to use our optimized API server for deployment. Inferencing with Transformers will be slower.
The default conversation template is shown below:
💡 **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks

```
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
```

The following is coding mode template, which may improve performance on coding tasks.
🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems

```
Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:
Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant:
```

⚠️ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.

The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`, which can be used instead of manually specifying the template.

# <a id="training"></a> 🛠️ Training

The OpenChat training system utilizes padding-free training and the [Multipack Sampler](https://github.com/imoneoi/multipack_sampler), achieving a **3~10x** speedup compared to the conventional padded training.
Expand All @@ -259,20 +272,26 @@ OpenChat supports Llama 2 and Mistral models. Please first choose a base model t

| Base Model | Size | Weights (with EOT token) | Model Type | Recommended Batch Size per GPU (8xA100 80GB) |
|------------|------|-----------------------------------|-------------------------|--------------------------------------|
| Mistral | 7B | `imone/Mistral_7B_with_EOT_token` | `openchat_v3.2_mistral` | 83968 |
| Llama 2 | 7B | `imone/LLaMA2_7B_with_EOT_token` | `openchat_v3.2` | 83968 |
| Mistral | 7B | `imone/Mistral_7B_with_EOT_token` | `openchat_v3.2_mistral` | 77824 |
| Llama 2 | 7B | `imone/LLaMA2_7B_with_EOT_token` | `openchat_v3.2` | 77824 |
| Llama 2 | 13B | `imone/Llama2_13B_with_EOT_token` | `openchat_v3.2` | 36864 |

Note: The OpenChat conversation template requires an `<|end_of_turn|>` special token. The base model specified must include this token. Our provided weights are the original base weights with this token added. If you want to add them manually, use the `convert_llama_weights_to_hf_add_tokens.py` or `mistral_add_tokens.py` in the `scripts` directory.

## Installing DeepSpeed
## Installing DeepSpeed and Flash Attention

First, ensure that the CUDA `nvcc` compiler is available in your environment. If it is not, install the CUDA toolkit that matches the version used by PyTorch.

Next, install DeepSpeed:
Next, install building dependencies:

```bash
pip install packaging ninja
```

Finally, install the packages:

```bash
pip install deepspeed
pip install deepspeed flash-attn
```

### Preparing Your Data
Expand Down

0 comments on commit 46c5d6b

Please sign in to comment.