Converting SmolVLM to ONNX #60

bharathsivaram10 · 2025-02-18T05:46:05Z

Hello!

I'm trying to convert a fine-tuned SmolVLM to ONNX and hopefully quantize to int8 to run on CPU. Two questions:

Has anyone tried this? I'd love to hear if I'm actually going about this the right way
My onnx conversion code is shown below, and it seems to take forever to run. It actually ends up crashing due to RAM overflow (on colab). And I don't think the architecture is supported by optimum

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image

# Load images
image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")

# Initialize processor and model
processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")

# Create input messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Can you describe the image?"}
        ]
    },
]

# Prepare inputs
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1], return_tensors="pt")

# Create a dictionary of regular tensors

tensor_inputs = {
    "pixel_values": inputs["pixel_values"].to(torch.float32),
    "pixel_attention_mask": inputs["pixel_attention_mask"].to(torch.float32),
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
}

model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceTB/SmolVLM-Instruct",
    torch_dtype=torch.bfloat16  # Changed to float32 for ONNX compatibility
)
model.eval()

# Dynamic axes for variable batch size and sequence length
dynamic_axes = {
    "pixel_values": {0: "batch_size"},
    "pixel_attention_mask": {0: "batch_size"},
    "input_ids": {0: "batch_size", 1: "sequence_length"},
    "attention_mask": {0: "batch_size", 1: "sequence_length"},
    "output": {0: "batch_size", 1: "sequence_length"}
}

# Export to ONNX
torch.onnx.export(
    model,
    (tensor_inputs,),  # Use tensor_inputs instead of inputs
    "smolvlm.onnx",
    input_names=list(tensor_inputs.keys()),
    output_names=["output"],
    dynamic_axes=dynamic_axes,
    opset_version=13,
    do_constant_folding=True,
    export_params=True
)

print("ONNX model saved as smolvlm.onnx")

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting SmolVLM to ONNX #60

Converting SmolVLM to ONNX #60

bharathsivaram10 commented Feb 18, 2025 •

edited

Loading

Converting SmolVLM to ONNX #60

Converting SmolVLM to ONNX #60

Comments

bharathsivaram10 commented Feb 18, 2025 • edited Loading

bharathsivaram10 commented Feb 18, 2025 •

edited

Loading