Skip to content

mindspore-lab/mindone

Repository files navigation

MindONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.04.10] We release MindONE v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] MindONE v0.2.0 is released

Quick tour

To install MindONE v0.3.0, please install MindSpore 2.5.0 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run.

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
  • compatibale with hf diffusers 0.32.2
component features
pipeline support text-to-image,text-to-video,text-to-audio tasks 160+
models support audoencoder & transformers base models same as hf diffusers 50+
schedulers support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

supported models under mindone/examples

task model inference finetune pretrain institute
Image-to-Video hunyuanvideo-i2v 🔥🔥 ✖️ ✖️ Tencent
Text/Image-to-Video wan2.1 🔥🔥🔥 ✖️ ✖️ Alibaba
Text-to-Image cogview4 🔥🔥🔥 ✖️ ✖️ Zhipuai
Text-to-Video step_video_t2v 🔥🔥 ✖️ ✖️ StepFun
Image-Text-to-Text qwen2_vl 🔥🔥🔥 ✖️ ✖️ Alibaba
Any-to-Any janus 🔥🔥🔥 DeepSeek
Any-to-Any emu3 🔥🔥 BAAI
Class-to-Image var🔥🔥 ByteDance
Text/Image-to-Video hpcai open sora 1.2/2.0 🔥🔥 HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B 🔥🔥 Zhipu
Text-to-Video open sora plan 1.3 🔥🔥 PKU
Text-to-Video hunyuanvideo 🔥🔥 Tencent
Text-to-Video movie gen 30B 🔥🔥 Meta
Video-Encode-Decode magvit Google
Text-to-Image story_diffusion ✖️ ✖️ ByteDance
Image-to-Video dynamicrafter ✖️ ✖️ Tencent
Video-to-Video venhancer ✖️ ✖️ Shanghai AI Lab
Text-to-Video t2v_turbo Google
Image-to-Video svd Stability AI
Text-to-Video animate diff CUHK
Text/Image-to-Video video composer Alibaba
Text-to-Image flux 🔥 ✖️ Black Forest Lab
Text-to-Image stable diffusion 3 🔥 ✖️ Stability AI
Text-to-Image kohya_sd_scripts ✖️ kohya
Text-to-Image stable diffusion xl Stability AI
Text-to-Image stable diffusion Stability AI
Text-to-Image hunyuan_dit Tencent
Text-to-Image pixart_sigma Huawei
Text-to-Image fit Shanghai AI Lab
Class-to-Video latte Shanghai AI Lab
Class-to-Image dit Meta
Text-to-Image t2i-adapter Shanghai AI Lab
Text-to-Image ip adapter Tencent
Text-to-3D mvdream ByteDance
Image-to-3D instantmesh Tencent
Image-to-3D sv3d Stability AI
Text/Image-to-3D hunyuan3d-1.0 Tencent

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava 🔥 ✖️ ✖️ support video and image captioning