You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:
Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside).
If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:
Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).
The text was updated successfully, but these errors were encountered:
ambroser53
changed the title
[DML] Olive generated adapters not working
[DML] Olive generated adapters not working with OrtGenAi
Jan 10, 2025
The lora workflow with DML has not been fully tested. From the olive side, we only verified the example with CPU and CUDA EP. I see that you opened a related issue on the onnxruntime-genai repo which I think is a good idea.
For running model without adapter on DML, only the model creating using model builder is supported (as far as I am aware).
With regards to the aside
the name mismatch is between the parameters (self_attn vs attn) when using the model builder is a known issue to us (Olive and ort-genai dev). We haven't come to a solution for this yet. For now, you would need to run the whole auto-opt workflow to get the compatible adapter file.
This issue should not be present when the model is generated using torch.onnx.export (i.e., without the --use_model_builder option).
I am not sure why you are seeing quality degradation. For all paths, the scaling is absorbed into the weights:
I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling
adapters.LoadAdapter
:Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the
olive auto-opt
call both with and without the--use_model_builder
option but they both get the same result. I have also tried using theconvert-adapters
olive call instead but the resulting adapters do not work with CPU EP either (see aside).If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling
AppendTokenSequences
:Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's
model_builder.py
and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that
convert-adapters
does lora scaling (alpha/rank) but I cannot find whether theauto-opt
call is doing the same. Creating adapters viaconvert-adapters
does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn.
rather than.layers.0.attn.
).The text was updated successfully, but these errors were encountered: