Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-dependent VaeFluxDecoderTest fail on HEAD #982

Closed
marbre opened this issue Feb 19, 2025 · 3 comments
Closed

Data-dependent VaeFluxDecoderTest fail on HEAD #982

marbre opened this issue Feb 19, 2025 · 3 comments
Assignees

Comments

@marbre
Copy link
Collaborator

marbre commented Feb 19, 2025

The Date-dependent tests currently fail at HEAD with

FAILED sharktank/tests/models/vae/vae_test.py::VaeFluxDecoderTest::testCompareBF16EagerVsHuggingface - RuntimeError: Given groups=1, weight of size [512, 16, 3, 3], expected input[1, 1, 4096, 64] to have 16 channels, but got 1 channels instead
FAILED sharktank/tests/models/vae/vae_test.py::VaeFluxDecoderTest::testCompareF32EagerVsHuggingface - RuntimeError: Given groups=1, weight of size [512, 16, 3, 3], expected input[1, 1, 4096, 64] to have 16 channels, but got 1 channels instead
FAILED sharktank/tests/models/vae/vae_test.py::VaeFluxDecoderTest::testVaeIreeVsHuggingFace - RuntimeError: Given groups=1, weight of size [512, 16, 3, 3], expected input[1, 1, 4096, 64] to have 16 channels, but got 1 channels instead

This was first reported by @renxida who commented on #973. However, it seems that the test started when commit ab42f0c / PR #876 was landed by @KyleHerndon, see the logs. Unfortunately, the test was red before but with a different error

FAILED sharktank/tests/models/vae/vae_test.py::VaeFluxDecoderTest::testVaeIreeVsHuggingFace - RuntimeError: Error invoking function: c/runtime/src/iree/hal/drivers/hip/event_semaphore.c:677: ABORTED; the semaphore was aborted; while invoking native function hal.fence.await; while calling import; 
[ 0] bytecode module.decode:84 {self._temp_dir}/flux_vae_bf16.mlir:140:3

(logs here) which seems to have been introduced with commit 9b829cd by @rsuderman but this test now fails with a different error as mentioned above.

@KyleHerndon
Copy link
Contributor

Looking at this now.

@KyleHerndon
Copy link
Contributor

Created this PR, I think it should fix it.

@marbre
Copy link
Collaborator Author

marbre commented Feb 21, 2025

Fixed with 5250392. The failing test PkgCI / Test shark-ai / Integration Test (cpu) seems unrelated and is rather an machine issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants