You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to convert a fine-tuned SmolVLM to ONNX and hopefully quantize to int8 to run on CPU. Two questions:
Has anyone tried this? I'd love to hear if I'm actually going about this the right way
My onnx conversion code is shown below, and it seems to take forever to run. It actually ends up crashing due to RAM overflow (on colab). And I don't think the architecture is supported by optimum
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image
# Load images
image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
# Initialize processor and model
processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")
# Create input messages
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Can you describe the image?"}
]
},
]
# Prepare inputs
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1], return_tensors="pt")
# Create a dictionary of regular tensors
tensor_inputs = {
"pixel_values": inputs["pixel_values"].to(torch.float32),
"pixel_attention_mask": inputs["pixel_attention_mask"].to(torch.float32),
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
}
model = AutoModelForVision2Seq.from_pretrained(
"HuggingFaceTB/SmolVLM-Instruct",
torch_dtype=torch.bfloat16 # Changed to float32 for ONNX compatibility
)
model.eval()
# Dynamic axes for variable batch size and sequence length
dynamic_axes = {
"pixel_values": {0: "batch_size"},
"pixel_attention_mask": {0: "batch_size"},
"input_ids": {0: "batch_size", 1: "sequence_length"},
"attention_mask": {0: "batch_size", 1: "sequence_length"},
"output": {0: "batch_size", 1: "sequence_length"}
}
# Export to ONNX
torch.onnx.export(
model,
(tensor_inputs,), # Use tensor_inputs instead of inputs
"smolvlm.onnx",
input_names=list(tensor_inputs.keys()),
output_names=["output"],
dynamic_axes=dynamic_axes,
opset_version=13,
do_constant_folding=True,
export_params=True
)
print("ONNX model saved as smolvlm.onnx")
The text was updated successfully, but these errors were encountered:
Hello!
I'm trying to convert a fine-tuned SmolVLM to ONNX and hopefully quantize to int8 to run on CPU. Two questions:
The text was updated successfully, but these errors were encountered: