Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pallas build duplicates work #467

Closed
olupton opened this issue Jan 11, 2024 · 5 comments
Closed

Pallas build duplicates work #467

olupton opened this issue Jan 11, 2024 · 5 comments
Labels
bug a bug of JAX-Toolbox itself instead of the JAX ecosystem software that it ships good first issue Good for newcomers

Comments

@olupton
Copy link
Collaborator

olupton commented Jan 11, 2024

When building wheels for Pallas the CI jobs are duplicating work.
e.g. in https://github.com/NVIDIA/JAX-Toolbox/actions/runs/7475866231/job/20344957851#step:10:428 then parts of the Dockerfile are being re-executed that should be cached from the "Build mealkit image" part of the same job.
For example, in the PAX equivalent this caching works: https://github.com/NVIDIA/JAX-Toolbox/actions/runs/7475866243/job/20344960164#step:10:181
I have not been able to reproduce this locally, where the run with --target final does hit the cache.

@olupton olupton added bug a bug of JAX-Toolbox itself instead of the JAX ecosystem software that it ships good first issue Good for newcomers labels Jan 11, 2024
@yhtang
Copy link
Collaborator

yhtang commented Jan 11, 2024

When you reproduce that locally, did you use the same buildkit version as in the GitHub Action?

@olupton
Copy link
Collaborator Author

olupton commented Jan 11, 2024

On the machine I tried locally I had

$ docker --version
Docker version 24.0.7, build afdd53b
$ docker buildx version
github.com/docker/buildx v0.11.2 9872040

whereas it looks like the CI has 24.0.0 and v0.12.0 542e5d810e4a1a155684f5f3c5bd7e797632a12f.
So no, they're close but don't match.

@olupton
Copy link
Collaborator Author

olupton commented Jan 21, 2025

We don't build a dedicated Pallas container anymore: #1161

@olupton olupton closed this as completed Jan 21, 2025
@nouiz
Copy link
Collaborator

nouiz commented Jan 21, 2025

Note, pallas is included in the JAX container.

@olupton
Copy link
Collaborator Author

olupton commented Jan 21, 2025

Yes. There is still a separate triton container containing JAX-Triton, which is arguably where we would expect this bug to show up if it were still valid. However, the caching seems to be working fine in the triton container build. I should have cited #798 for why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug a bug of JAX-Toolbox itself instead of the JAX ecosystem software that it ships good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants