Skywork-R1V: Pioneering Multimodal Reasoning with CoT

[📖Technical Report] [🤗 Skywork-R1V-38B] [🤖 ModelScope]

Welcome to the Skywork-R1V repository! Here, you'll find the model weights and inference code for our state-of-the-art open-sourced multimodal reasoning model, enabling advanced visual and logical thinking.

🔥News

April 9, 2025: Our technical report is currently available on arxiv: [Skywork-R1V: Pioneering Multimodal Reasoning with CoT].

April 1, 2025: Skywork-R1V supports inference with [vLLM], On 4×L20Y GPUs, vLLM generates 1k tokens in ~12.3s, at least 5× faster than transformers.

Mar 26, 2025: We released awq quantized version of Skywork R1V[🤗 Skywork-R1V-38B-AWQ], supporting single-card (above 30GB) inference.

Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀

Feature

Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.

Evaluation

Comparison with Larger-Scale Open-Source and Closed-Source Models

	Benchmark	LLM	VLM
		QwQ-32B-Preview	QwenVL-2-72B	InternVL-2.5-38B	VILA 1.5-40B	InternVL2-40B	Skywork-R1V-38B
Reasoning	MATH-500	90.6	-	-	-	-	94.0
	AIME 2024	50.0	-	-	-	-	72.0
	GPQA	54.5	-	-	-	-	61.6
Vision	MathVista(mini)	-	70.5	71.9	49.5	63.7	67.5
	MMMU(Val)	-	64.5	63.9	55.1	55.2	69.0

Evaluation results of state-of-the-art LLMs and VLMs

	Size	Vision	Reasoning			Vision
			MATH-500	AIME 2024	GPQA	MathVista(mini)	MMMU(Val)
			pass@1	pass@1	pass@1	pass@1	pass@1
Qwen2.5-72B-Instruct	72B	❌	80.0	23.3	49.0	-	-
Deepseek V3	671B	❌	90.2	39.2	59.1	-	-
Deepseek R1	671B	❌	97.3	79.8	71.5	-	-
Claude 3.5 Sonnet	-	✅	78.3	16.0	65.0	65.3	66.4
GPT-4o	-	✅	74.6	9.3	49.9	63.8	69.1
Kimi k1.5	-	✅	96.2	77.5	-	74.9	70.0
Qwen2.5-VL-72B-Instruct	72B	✅	-	-	-	74.8	70.2
LLaVA-Onevision-72B	72B	✅	-	-	-	67.5	56.8
InternVL2-Llama3-76B	76B	✅	-	-	-	65.5	62.7
InternVL2.5-78B	78B	✅	-	-	-	72.3	70.1
Skywork-R1V-38B	38B	✅	94.0	72.0	61.6	67.5	69.0

Comparison with Larger-Scale Closed-Source Models

Comparison with Larger-Scale Open-Source Models

How to Run Locally

1. Clone the Repository

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

2. Set Up the Environment

conda create -n r1-v python=3.10
conda activate r1-v
bash setup.sh

3. Run the Inference Script

CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
    --model_path path \
    --image_paths image1_path \
    --question "your question"

How to Run Locally with vLLM

1. Set Up the Environment

Refer to vLLM's installation from the source. https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html

conda create -n r1v-vllm python=3.12
conda activate r1v-vllm
pip install pillow==11.1.0
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .

2. Clone the Repository

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

3. Run the Inference Script

python inference_with_vllm.py \
    --model_path path \
    --image_paths image1_path image2_path \
    --question "your question" \
    --tensor_parallel_size 4

License

This code repository is licensed under the MIT License. ✅ Commercial use permitted

✅ Modification allowed

✅ Distribution allowed

❌ No liability

Citation

If you use Skywork-R1V in your research, please cite:

@misc{peng2025skyworkr1vpioneeringmultimodal,
      title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought}, 
      author={Yi Peng and Chris and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
      year={2025},
      eprint={2504.05599},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.05599}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
imgs		imgs
inference		inference
LICENSE		LICENSE
README.md		README.md
Skywork_R1V.pdf		Skywork_R1V.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skywork-R1V: Pioneering Multimodal Reasoning with CoT

🔥News

Feature

Evaluation

How to Run Locally

1. Clone the Repository

2. Set Up the Environment

3. Run the Inference Script

How to Run Locally with vLLM

1. Set Up the Environment

2. Clone the Repository

3. Run the Inference Script

License

Citation

Star History

About

Releases

Packages

Contributors 6

Languages

License

SkyworkAI/Skywork-R1V

Folders and files

Latest commit

History

Repository files navigation

Skywork-R1V: Pioneering Multimodal Reasoning with CoT

🔥News

Feature

Evaluation

How to Run Locally

1. Clone the Repository

2. Set Up the Environment

3. Run the Inference Script

How to Run Locally with vLLM

1. Set Up the Environment

2. Clone the Repository

3. Run the Inference Script

License

Citation

Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages