-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error writing object to object store service - JASMIN OS. #743
Comments
Welcome @oj-tooth! In order to Icechunk to provide safe transactional guarantees around updates, the object store itself must support certain transactional operations--specifically it must support "conditional writes", i.e. it must fail to create an object if it already exists. Here's the AWS S3 documentation on this. The conditional write is implemented via passing a specific header in the PUT request to the object store. My guess is that your object store is not supporting this header. Can you say more about the object store? E.g. is it Ceph-based, Minio, etc? |
Thanks for the quick response @rabernat. Afraid there is little information publicly available on the details of the JASMIN OS backend, I have asked for more info from the team at CEDA based on your comments and will get back to you ASAP. |
Just a quick update on this issue: on requesting further information from CEDA, they have confirmed with the current vendor DataCore that the object store doesn't currently support the if-none-match with s3, but the underlying layer has this functionality. The vendor is yet to add this functionality, but will discuss adding it moving forward. Happy to drop another update when we get started with Icechunk on JASMIN - this would be incredibly impactful for our distribution of ocean-climate data. Thanks for all your work @rabernat & co. |
Sounds good! In the interim, I believe we could create an option which allows you to use Icechunk in "unsafe" mode. Basically, in this mode, there would be the risk of different sessions overwriting each others' commits. However, if you could manage to avoid conflicts through some out-of-band mechanism, then you could still start to use Icechunk to distribute data. This wouldn't effect read-only users. |
That sounds fantastic as a temporary solution for us to get started distributing our data - I'll be managing the workflow for now, so can be extra vigilant to avoid overwriting commits. Great to know that our end users wouldn't be impacted! |
@oj-tooth we have released Icechunk 0.2.1 and we believe it may work for you. Could you please give this code a try? import icechunk
# same thing you were doing
storage = ....
repo_config1 = icechunk.RepositoryConfig(
storage = icechunk.StorageSettings(unsafe_use_conditional_update=False)
)
repo_config2 = icechunk.RepositoryConfig(
storage = icechunk.StorageSettings(
unsafe_use_conditional_update=False,
unsafe_use_conditional_create=False,
)
)
repo_config3 = icechunk.RepositoryConfig(
storage = icechunk.StorageSettings(
unsafe_use_conditional_update=False,
unsafe_use_conditional_create=False,
unsafe_use_metadata=False,
)
)
repo = icechunk.Repository.create(storage, config=repo_config1) In there we have 3 different configurations, from best to worst. So try first creating the repo with As we mentioned before, these configurations may help with your particular object store, but it comes at the price of losing some of Icechunk's consistency guarantees. In particular, this means that if two users are calling Please let us know how things go. |
@paraseba thanks for following up on this - I really appreciate it. I ran the following test with Icechunk v0.2.1, but unfortunately still pick up an error irrespective of which repo configuration I use. # -- Create Icechunk repository -- #
with open(credentials_filepath) as f:
jasmin_store_credentials = json.load(f)
storage = icechunk.s3_storage(
bucket="icechunk",
prefix="test",
region='',
access_key_id=jasmin_store_credentials['token'],
secret_access_key=jasmin_store_credentials['secret'],
endpoint_url=jasmin_store_credentials['endpoint_url'],
allow_http=True,
)
repo_config = icechunk.RepositoryConfig(
storage = icechunk.StorageSettings(
unsafe_use_conditional_update=False,
unsafe_use_conditional_create=False,
unsafe_use_metadata=False,
)
)
repo = icechunk.Repository.create(storage=storage, config=repo_config) This produces the following error: ---------------------------------------------------------------------------
IcechunkError Traceback (most recent call last)
Cell In[1], line 29
11 storage = icechunk.s3_storage(
12 bucket="icechunk",
13 prefix="test",
(...)
18 allow_http=True,
19 )
21 repo_config = icechunk.RepositoryConfig(
22 storage = icechunk.StorageSettings(
23 unsafe_use_conditional_update=False,
(...)
26 )
27 )
---> 29 repo = icechunk.Repository.create(storage=storage, config=repo_config)
File /dssgfs01/working/otooth/conda_envs/env_npd_intake/lib/python3.12/site-packages/icechunk/repository.py:51, in Repository.create(cls, storage, config, virtual_chunk_credentials)
25 @classmethod
26 def create(
27 cls,
(...)
30 virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
31 ) -> Self:
32 """
33 Create a new Icechunk repository.
34 If one already exists at the given store location, an error will be raised.
(...)
48 An instance of the Repository class.
49 """
50 return cls(
---> 51 PyRepository.create(
52 storage,
53 config=config,
54 virtual_chunk_credentials=virtual_chunk_credentials,
55 )
56 )
IcechunkError: x error writing object to object store service error
|
| context:
| 0: icechunk::storage::s3::update_config
| with previous_version=VersionInfo { etag: None, generation: None }
| at icechunk/src/storage/s3.rs:301
| 1: icechunk::repository::store_config
| with previous_version=VersionInfo { etag: None, generation: None }
| at icechunk/src/repository.rs:355
| 2: icechunk::repository::create
| at icechunk/src/repository.rs:136
|
|-> error writing object to object store service error
|-> service error
|-> unhandled error (NotImplemented)
`-> Error { code: "NotImplemented", message: "A header you provided implies functionality that is not implemented.", aws_request_id: "B081A60A91E6A61A" } Looking at the S3 bucket, we now have |
Ah this is good, this means we are very very close! it would be much easier if you had an answer from the Jasmin people. But I suspect they don't support setting the content-type on the configuration file we are trying to write to the repo. We can work on this and also make it optional. I'll be out next week, but I can take on this when I'm back, and maybe by then we have more info from Jasmin. Thank you for giving it a try. |
Hi @oj-tooth we just issued a release that sets the content-type on if |
Hi @dcherian, I've just tested with v0.2.3, but afraid this produces the same |
Hi, I'm an ocean modeller at the National Oceanography Centre, UK.
I've encountered an error when attempting to get Icechunk set-up with the JASMIN object store. I have included the Python code used to create an initial Icechunk repository and have included the error below. When I run this a new object
test
(includingsnapshots
) is created in theicechunk
bucket on our JASMIN OS tenancy, but a NotImplemented error related to ref_key="branch.main/ZZZZZZZZ.json" is raised.Any help would be hugely appreciated - it would be fantastic to get started version controlling our datasets!
Python Code:
(Python 3.12.8, Icechunk v0.1.3)
Traceback
The text was updated successfully, but these errors were encountered: