Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The examples of the docs raise IcechunkError: repository error: repositories can only be created in clean prefixes #665

Open
josephnowak opened this issue Feb 1, 2025 · 10 comments
Labels
bug 🐛 Something isn't working windows

Comments

@josephnowak
Copy link

josephnowak commented Feb 1, 2025

Hi,

I'm very excited to start using Icechunk, and I saw that you released the first non-alpha version, I thought to test it for the first time, but I have found that some of the examples fail with the error that the title mentions (IcechunkError: repository error: repositories can only be created in clean prefixes), I would like to know if I'm doing something wrong, but I'm mostly copying and pasting the code of the example. (I understand that is normal to have such errors in a product that is so recent but I thought that it would be good to report it, if you already know about them you can close this issue)

import tempfile
import icechunk as ic

storage = ic.local_filesystem_storage(tempfile.mkdtemp())
icechunk_repo = ic.Repository.create(storage)

I also tried to do it with a non temporal folder and I got the same error.

Traceback:

---------------------------------------------------------------------------
IcechunkError                             Traceback (most recent call last)
Cell In[10], line 5
      2 import icechunk as ic
      4 storage = ic.local_filesystem_storage(tempfile.mkdtemp())
----> 5 icechunk_repo = ic.Repository.create(storage)
      6 # icechunk_session = icechunk_repo.writable_session("main")

File ~\miniconda3\envs\test-icechunk\Lib\site-packages\icechunk\repository.py:50, in Repository.create(cls, storage, config, virtual_chunk_credentials)
     24 @classmethod
     25 def create(
     26     cls,
   (...)
     29     virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
     30 ) -> Self:
     31     """
     32     Create a new Icechunk repository.
     33     If one already exists at the given store location, an error will be raised.
   (...)
     47         An instance of the Repository class.
     48     """
     49     return cls(
---> 50         PyRepository.create(
     51             storage,
     52             config=config,
     53             virtual_chunk_credentials=virtual_chunk_credentials,
     54         )
     55     )

IcechunkError: repository error: repositories can only be created in clean prefixes

Additionally, I have noticed that some of the examples have some small details with the names of the variables, like this one:

# initialize a distributed Client
from distributed import Client

client = Client()

# initialize the icechunk store
import icechunk

storage = icechunk.local_filesystem_storage("./icechunk-xarray")
icechunk_repo = icechunk.Repository.create(storage_config)
icechunk_session = icechunk_repo.writable_session("main")

The storage_config variable was never created which makes it unusable if you copy and paste it directly.

Additional information about my environment:

INSTALLED VERSIONS

commit: None
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 11
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Spanish_Venezuela', '1252')
libhdf5: None
libnetcdf: None

xarray: 2025.1.2
pandas: 2.2.3
numpy: 2.2.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.1.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.2.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 25.0
conda: None
pytest: None
mypy: None
IPython: 8.32.0
sphinx: None
icechunk: 0.1.0

@josephnowak josephnowak changed the title IcechunkError: repository error: repositories can only be created in clean prefixes using examples of the docs Using the examples of the docs I get IcechunkError: repository error: repositories can only be created in clean prefixes Feb 1, 2025
@josephnowak josephnowak changed the title Using the examples of the docs I get IcechunkError: repository error: repositories can only be created in clean prefixes The examples of the docs raise IcechunkError: repository error: repositories can only be created in clean prefixes Feb 1, 2025
@paraseba
Copy link
Collaborator

paraseba commented Feb 1, 2025

Thank for reporting this @josephnowak . I'm not able to reproduce this locally, which makes me thing this is probably a bug for Windows users. We still don't have a good test suite for Windows.

We'll ship a fix on Monday.

@paraseba paraseba added the bug 🐛 Something isn't working label Feb 1, 2025
@paraseba
Copy link
Collaborator

paraseba commented Feb 1, 2025

In the meantime, could you try creating your repo in a nonexistent directory?

storage = ic.local_filesystem_storage("some-dir-that-doesnt-exist")

@josephnowak
Copy link
Author

josephnowak commented Feb 1, 2025

Thanks for the fast response. I ran the same code on Linux (same machine) and it did not raise the error as you mentioned.

I tried to run the code with the "some-dir-that-doesnt-exist" and it raised the same error on Windows, but I can test directly on Linux thanks.

@paraseba
Copy link
Collaborator

paraseba commented Feb 1, 2025

Thank you @josephnowak , that's valuable input. We'll fix this and get back to you.

@TomNicholas
Copy link
Contributor

I'm seeing this error in VirtualiZarr CI, which is on linux e.g. here

@TomNicholas
Copy link
Contributor

TomNicholas commented Feb 2, 2025

could you try creating your repo in a nonexistent directory?

But our test fixtures might be guilty of this (cc @mpiannucci )

@abarciauskas-bgse
Copy link
Contributor

@TomNicholas @paraseba thanks for the hint, it was indeed that the test fixtures for the append feature were not using a repo fixture, so the repo object was not being torn down after every test (fixed in zarr-developers/VirtualiZarr#417). I'm not entirely sure why it would matter since the tmp_path of the storage config should have been different for each of these tests, so if anyone can enlighten me on that I would appreciate it.

@paraseba
Copy link
Collaborator

paraseba commented Feb 3, 2025

@josephnowak I tried to fix the local file Storage instance in Windows, here is my initial attempt, but I failed. It's a bit tricky, for me particularly, I have very little windows experience, and from more than 20 years ago. The issue seems to be something simple around paths, it cannot form an absolute path from the temporary directory we are giving it. I printed the directory and it looks reasonable, so I'm not sure.

I suspect Icechunk will work in windows, outside of the local file storage. Have you tried hitting s3 or in memory storage?

Any chance you could pick up my attempt at fixing this? I can help with any Icechunk stuff you may need.

@josephnowak
Copy link
Author

Thanks for dedicating time to fix this, and sure, I will take a look in a couple of days at that branch to try to see if I can find the problem.

I already tested using S3 with standard buckets on Windows and at least I was able to create the repository, I will do some additional tests later to see if I can write some datasets.

@paraseba
Copy link
Collaborator

paraseba commented Feb 4, 2025

Sounds great @josephnowak, thank you! Glad to hear you made it work against S3 from Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working windows
Projects
Status: No status
Development

No branches or pull requests

4 participants