Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenVINO backend for torch.compile node #6638

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

openvino-dev-samples
Copy link

@openvino-dev-samples openvino-dev-samples commented Jan 29, 2025

To support both .safetensor model and LoRa weights with OpenVINO runtime

image

pip install openvino
python3 main.py --cpu --use-pytorch-cross-attention

#2473

@simonlui
Copy link
Contributor

Questions, as an Intel Arc owner and having contributed to the repository.

1.) I have used both the Triton(inductor) and OpenVINO backends using custom nodes and Triton is faster in both compilation time and speed from what I have tested. How do these backends differ and plan to be supported because the benefit to me for adding OpenVINO seems minimal outside of supporting NPU devices.

2.) Are there sensible errors that pop up if you don't have a suitable OpenVINO GPU or NPU and try to run with this node and ways to diagnose how to solve them if users run into them? This can be a issue and both device types, but especially NPU, require drivers at this time to function properly and I can't even get my LNL laptop to use NPU at this time on Linux so I also have questions about maturity too at this time.

@openvino-dev-samples
Copy link
Author

openvino-dev-samples commented Feb 1, 2025

Questions, as an Intel Arc owner and having contributed to the repository.

1.) I have used both the Triton(inductor) and OpenVINO backends using custom nodes and Triton is faster in both compilation time and speed from what I have tested. How do these backends differ and plan to be supported because the benefit to me for adding OpenVINO seems minimal outside of supporting NPU devices.

2.) Are there sensible errors that pop up if you don't have a suitable OpenVINO GPU or NPU and try to run with this node and ways to diagnose how to solve them if users run into them? This can be a issue and both device types, but especially NPU, require drivers at this time to function properly and I can't even get my LNL laptop to use NPU at this time on Linux so I also have questions about maturity too at this time.

hi @simonlui great thanks for your quick feedback.

  1. i believe the benefit of adding openvino is to trigger Intel GPU and NPU on inferencing.
  2. you are right, so I added a method to detect the devices supported by openvino on system firstly, and list them on UI. For how to check the correctness of hardware installation, i will update the documents to illustrate this part once the approach in this PR is accepted. you also can find these information in openvino's document site. https://docs.openvino.ai/2024/get-started/configurations.html

Panchovix added a commit to Panchovix/stable-diffusion-webui-reForge that referenced this pull request Feb 9, 2025
Finally.
Thanks to comfyanonymous/ComfyUI#6638 to use as a guide for add_patches function
@Panchovix
Copy link

Panchovix commented Feb 9, 2025

Hi there, just passed by and wanted to say, many many thanks! Finally with your add_patches modifications (and others) managed to make loras work with torch.compile. Really appreciated!

@Panchovix
Copy link

Panchovix commented Feb 9, 2025

Sorry for double post, but wondering, does loading a lora, then disabling it, and then enabling it again works fine for you?

Maybe some unpatching or recompiling is needed?

I think on a first inference with a lora, it will patch the keys before compiling, and it will work.

If you then disable it and enable the lora, it will compile without a lora and will add some _orig_mod prefixes to the keys, so when trying to apply the lora keys again on a 3rd inference to the compiled model, it will not match the key and it won't load.

Correct me if I'm wrong though.

@openvino-dev-samples
Copy link
Author

openvino-dev-samples commented Feb 10, 2025

python3 main.py --cpu --use-pytorch-cross-attention

I think it can supp

Sorry for double post, but wondering, does loading a lora, then disabling it, and then enabling it again works fine for you?

Maybe some unpatching or recompiling is needed?

I think on a first inference with a lora, it will patch the keys before compiling, and it will work.

If you then disable it and enable the lora, it will compile without a lora and will add some _orig_mod prefixes to the keys, so when trying to apply the lora keys again on a 3rd inference to the compiled model, it will not match the key and it won't load.

Correct me if I'm wrong though.

Hi, when your implementation path start from a checkpoint without LoRa, everything works. However if it starts from a checkpoint with LoRa, the enabling and disabling LoRA does not work. which mean:

  • model without lora -> lora enabled -> lora disabled
  • model with lora -> lora disabled -> lora enabled

In second case, my new patch will not be triggered, so I believe it is a general issue for torch.compile node, and I will do furthe investigation.

@openvino-dev-samples
Copy link
Author

python3 main.py --cpu --use-pytorch-cross-attention

I think it can supp

Sorry for double post, but wondering, does loading a lora, then disabling it, and then enabling it again works fine for you?
Maybe some unpatching or recompiling is needed?
I think on a first inference with a lora, it will patch the keys before compiling, and it will work.
If you then disable it and enable the lora, it will compile without a lora and will add some _orig_mod prefixes to the keys, so when trying to apply the lora keys again on a 3rd inference to the compiled model, it will not match the key and it won't load.
Correct me if I'm wrong though.

Hi, when your implementation path start from a checkpoint without LoRa, everything works. However if it starts from a checkpoint with LoRa, the enabling and disabling LoRA does not work. which mean:

  • model without lora -> lora enabled -> lora disabled
  • model with lora -> lora disabled -> lora enabled

In second case, my new patch will not be triggered, so I believe it is a general issue for torch.compile node, and I will do furthe investigation.

I have updated the PR, however it may need 2 warm-up inference for first time generation with LoRA weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants