Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown in asynchronous context leaves session and connector open #943

Open
gbip opened this issue Mar 5, 2025 · 3 comments
Open

Shutdown in asynchronous context leaves session and connector open #943

gbip opened this issue Mar 5, 2025 · 3 comments

Comments

@gbip
Copy link

gbip commented Mar 5, 2025

When opening an asynchronous S3FileSsytem, it seems that some aiohttp ressources are not cleaned properly.

I can provide an example that uses xarray, fsspec, s3fs and zarr to create this issue. However I could not find any free zarr dataset. Here is the warning that are raised upong leaving the python interpreter :

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fe2071f6c60>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x7fe2071ca6f0>, 24766.009516954), (<aiohttp.client_proto.ResponseHandler object at 0x7fe2071cac30>, 24766.015895832)])']
connector: <aiohttp.connector.TCPConnector object at 0x7fe2071ab800>

IIt seems that a similar issue existed with gcsfs, here is the PR that patched it : https://github.com/fsspec/gcsfs/pull/657/files. HTTPFileSystem might have a similar issue : zarr-developers/zarr-python#2674 (comment).

I tried to implement a similar thing, but after trying for a few hours I gave up because I am not familiar with Python async programming :(

@martindurant
Copy link
Member

I believe there may already be an issue referencing this, although I can't immediately see it.

The documentation recommends explicitly stashing and closing the async session yourself, since automatic cleanup is indeed tricky: https://s3fs.readthedocs.io/en/latest/#async
How are you in fact calling s3fs in your code?

@gbip
Copy link
Author

gbip commented Mar 6, 2025

Thank you for your answer !

I do something like that :

s3_endpoint = "http://localhost:19900"

fs = fsspec.filesystem(
    "s3",
    key="ACCESS",
    secret="S3CRET",
    endpoint_url=s3_endpoint,
    asynchronous=True,
)
store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
dataset = xarray.open_zarr(store, zarr_format=3)

I'll try to make a context manager to perform ressource cleaning, as this is for a library and I don't want to put this burden on the final user.

EDIT : Note that I do not wish to perform async operation at the moment, however the zarr_python library raises a warning if I do not create an asynchronous object :

UserWarning: fs (<s3fs.core.S3FileSystem object at 0x7f654795dfd0>) was not created with `asynchronous=True`, this may lead to surprising behavior

@martindurant
Copy link
Member

Zarr should be able to do this for you automatically, I think.

The problem is, that the filesystem you make is created outside of an asynchronous context. This means, that by the time of cleanup (probably at interpreter shutdown), there may be no event loop running anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants