Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML] Olive generated adapters not working with OrtGenAi #1544

Open
ambroser53 opened this issue Jan 10, 2025 · 1 comment
Open

[DML] Olive generated adapters not working with OrtGenAi #1544

ambroser53 opened this issue Jan 10, 2025 · 1 comment

Comments

@ambroser53
Copy link

I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:

Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator

I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside).

If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:

Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.

The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.

OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B

(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).

@ambroser53 ambroser53 changed the title [DML] Olive generated adapters not working [DML] Olive generated adapters not working with OrtGenAi Jan 10, 2025
@jambayk
Copy link
Contributor

jambayk commented Jan 21, 2025

Hi, thanks for opening this issue.

The lora workflow with DML has not been fully tested. From the olive side, we only verified the example with CPU and CUDA EP. I see that you opened a related issue on the onnxruntime-genai repo which I think is a good idea.
For running model without adapter on DML, only the model creating using model builder is supported (as far as I am aware).

With regards to the aside

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants