Welcome to the Skywork-R1V repository! Here, you'll find the model weights and inference code for our state-of-the-art open-sourced multimodal reasoning model, enabling advanced visual and logical thinking.
April 9, 2025: Our technical report is currently available on arxiv: [Skywork-R1V: Pioneering Multimodal Reasoning with CoT].
April 1, 2025: Skywork-R1V supports inference with [vLLM], On 4×L20Y GPUs, vLLM generates 1k tokens in ~12.3s, at least 5× faster than transformers.
Mar 26, 2025: We released awq quantized version of Skywork R1V[🤗 Skywork-R1V-38B-AWQ], supporting single-card (above 30GB) inference.
Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀
- Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
- Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
- Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.
Benchmark | LLM | VLM | |||||
---|---|---|---|---|---|---|---|
QwQ-32B-Preview | QwenVL-2-72B | InternVL-2.5-38B | VILA 1.5-40B | InternVL2-40B | Skywork-R1V-38B | ||
Reasoning | MATH-500 | 90.6 | - | - | - | - | 94.0 |
AIME 2024 | 50.0 | - | - | - | - | 72.0 | |
GPQA | 54.5 | - | - | - | - | 61.6 | |
Vision | MathVista(mini) | - | 70.5 | 71.9 | 49.5 | 63.7 | 67.5 |
MMMU(Val) | - | 64.5 | 63.9 | 55.1 | 55.2 | 69.0 |
Size | Vision | Reasoning | Vision | |||||
---|---|---|---|---|---|---|---|---|
MATH-500 | AIME 2024 | GPQA | MathVista(mini) | MMMU(Val) | ||||
pass@1 | pass@1 | pass@1 | pass@1 | pass@1 | ||||
Qwen2.5-72B-Instruct | 72B | ❌ | 80.0 | 23.3 | 49.0 | - | - | |
Deepseek V3 | 671B | ❌ | 90.2 | 39.2 | 59.1 | - | - | |
Deepseek R1 | 671B | ❌ | 97.3 | 79.8 | 71.5 | - | - | |
Claude 3.5 Sonnet | - | ✅ | 78.3 | 16.0 | 65.0 | 65.3 | 66.4 | |
GPT-4o | - | ✅ | 74.6 | 9.3 | 49.9 | 63.8 | 69.1 | |
Kimi k1.5 | - | ✅ | 96.2 | 77.5 | - | 74.9 | 70.0 | |
Qwen2.5-VL-72B-Instruct | 72B | ✅ | - | - | - | 74.8 | 70.2 | |
LLaVA-Onevision-72B | 72B | ✅ | - | - | - | 67.5 | 56.8 | |
InternVL2-Llama3-76B | 76B | ✅ | - | - | - | 65.5 | 62.7 | |
InternVL2.5-78B | 78B | ✅ | - | - | - | 72.3 | 70.1 | |
Skywork-R1V-38B | 38B | ✅ | 94.0 | 72.0 | 61.6 | 67.5 | 69.0 |
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
conda create -n r1-v python=3.10
conda activate r1-v
bash setup.sh
CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
--model_path path \
--image_paths image1_path \
--question "your question"
Refer to vLLM's installation from the source. https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html
conda create -n r1v-vllm python=3.12
conda activate r1v-vllm
pip install pillow==11.1.0
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
python inference_with_vllm.py \
--model_path path \
--image_paths image1_path image2_path \
--question "your question" \
--tensor_parallel_size 4
This code repository is licensed under the MIT License. ✅ Commercial use permitted
✅ Modification allowed
✅ Distribution allowed
❌ No liability
If you use Skywork-R1V in your research, please cite:
@misc{peng2025skyworkr1vpioneeringmultimodal,
title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author={Yi Peng and Chris and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.05599},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.05599},
}