Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow perf with object_store #723

Open
dcherian opened this issue Feb 11, 2025 · 2 comments
Open

slow perf with object_store #723

dcherian opened this issue Feb 11, 2025 · 2 comments

Comments

@dcherian
Copy link
Contributor

dcherian commented Feb 11, 2025

Here are two dask performance reports for writing the "public" ERA5 dataset using Coiled:

  1. S3
  2. GCS

For identical amount of data, with S3 it took 15 minutes, with Google Cloud it took 2.5 hours.

On S3, individual write tasks took 11-12sec. While it takes 55s on Google Cloud

@mpiannucci
Copy link
Contributor

mpiannucci commented Feb 11, 2025

On S3, individual write tasks took 11-12sec. While it takes 55s on Google Cloud

Is this on a single worker? Or is this across many processes that icechunk has been pickled across? We have a hunch this is about the overhead of recreating the object_store + fetching google cloud bearer token on the fly

@dcherian
Copy link
Contributor Author

Multiple workers, but in theory they should be getting unpickled once per worker, so later tasks should not see this overhead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants