Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Schema::open with simple quickstart_dense test #5

Merged
merged 3 commits into from
Mar 19, 2024

Conversation

rroelke
Copy link
Collaborator

@rroelke rroelke commented Mar 18, 2024

This pull request adds an equivalent of the tiledb_array_schema_load function which opens an array schema from a URI.

Prior work with array schema creation put the Domain in as a field in the ArraySchema, so most of the implementation here is actually about that. ArraySchema::open calls Domain::load which fetches into Rust the information needed from the C API.

"Prior work" was the focus on the quickstart_dense example which creates, rather than reads, an array. In that example we had to make sure that the Domain lifetime was at least as long as the Schema lifetime, and then Domain has a vector of Dimension for the same reason.

Since the tests for this naturally want to ask about the dimensions, there are also changes to the Domain::dimension function, specifically to return a reference to an existing one rather than create a new one. This uses interior mutability to lazily manifest the dimensions from the C API.

@rroelke rroelke requested a review from davisp March 18, 2024 16:50
Copy link
Collaborator

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a number of comments, the important bits are around lifetime stuff. I know its fairly confusing to intuit some of our APIs given that you've done Domain stuff where the returned pointers are "borrows" that require you to copy stuff out. In these cases where you're holding pointers to internal bits, I'd argue that we should avoid that at all costs as those internal handle classes are designed specifically for allowing wrappers to avoid a lot of complicated lifetime issues.

Or in other words, for almost all of the tiledb_$thing_t types, they're meant to be used in an RAII style class. I.e., the pointer gets set in a "constructor" and then freed in the "destructor" where I'm using quotes because there are different names for those in various languages.

An example where we're doing it almost correctly is here in the filter list:

https://github.com/TileDB-Inc/tiledb-rs/blob/main/src/filter_list.rs#L69-L90

Almost because that's till using the Borrowed pattern which is a memory leak in these cases. But that approach to just returning an instance of the thing we're getting should be the preference unless there's a very compelling reason why that could be wrong (only thing that comes to mind would be caching expensive calls, but even then, I'd probably suggest that cache be upstream in user code rather than something we do internally. Also, expensive is relative there, as I'd expect performance of these calls to be in the 10-100 kHz range).

src/array/dimension.rs Outdated Show resolved Hide resolved
src/array/domain.rs Outdated Show resolved Hide resolved
src/array/domain.rs Outdated Show resolved Hide resolved
src/array/domain.rs Outdated Show resolved Hide resolved
src/array/schema.rs Outdated Show resolved Hide resolved
src/array/schema.rs Outdated Show resolved Hide resolved
src/array/schema.rs Outdated Show resolved Hide resolved
Comment on lines 319 to 317
let rows = domain.dimension(0).expect("Error reading rows dimension");
assert_eq!(Datatype::Int32, rows.datatype());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to illustrate the lifetimes, if you were to put this in a loop so that you called this N times to create the dimension and then drop it, internally it'd be just fine. Shoving the returned dimensions into a vector or similar would also be fine. You'd have N distinct allocated handle objects that all proxy through to the underlying internal shared_ptr<tiledb::sm::Dimension>.

@rroelke
Copy link
Collaborator Author

rroelke commented Mar 19, 2024

I'm pleased to report that removing the interior secondary state is perfectly fine and I have now done that. Changes are addressed.

@rroelke rroelke force-pushed the rr/array-schema-load branch 2 times, most recently from 907a760 to e56a3cd Compare March 19, 2024 18:10
Copy link
Collaborator

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rroelke rroelke force-pushed the rr/array-schema-load branch from e56a3cd to 7548274 Compare March 19, 2024 18:23
@rroelke rroelke merged commit b646047 into main Mar 19, 2024
1 check passed
@rroelke rroelke deleted the rr/array-schema-load branch March 19, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants