Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul RDF reading #240

Merged
merged 33 commits into from
Jan 11, 2024
Merged

Overhaul RDF reading #240

merged 33 commits into from
Jan 11, 2024

Conversation

mirzov
Copy link
Member

@mirzov mirzov commented Jan 10, 2024

An "epic" refactoring changing how RDF is read on the back end. The current approach has two fundamental deficiencies:

  • a new RDF4J repository connection is opened for every read operation, no matter how small; this results in ~100 connection opening/closures to read a DataObject instance. Additionally, many read operations produce a ClosableIterator that keeps the connection open, and close it when fully consumed or explicitly closed. As a result, there is high risk of writing code prone to connection leaks, and this has caused a serious malfunction in production.
  • validation of property cardinalities is done in an unforgiving manner, by throwing an exception. This has benefits of discovering metadata problems quickly, but makes the services unnecessarily fragile.

This PR makes several types of improvements:

  • introduces TriplestoreConnection that will handle most read operations instead of InstanceServer, who is now used to provide the connection for safe "open connection -> complex read -> close connection" operations.
  • introduces a concept of RdfLens allowing to focus TriplestoreConnection on different RDF graphs without the need of creating a different RDF4J connection, or using a different InstanceServer.
  • uses Scala 3 opaque types to add type safety with various flavours of TriplestoreConnection (depending on what RDF graphs they can "see")
  • uses Validated datatype to wrap all RDF property cardinality validations instead of throwing exceptions. This allows retaining error information while still producing usable output. The errors are then displayed as warnings on the landing pages (to begin with).
  • changes all the RDF reading code accordingly to the changes listed above
  • improves "plumbing" of various components, reducing needless object creations; uses "vanilla" (without SPARQL "magic") triplestore when "magic" is not needed.

mirzov and others added 30 commits October 26, 2023 18:48
Also simplified ScopedValidator and fixed broken compilation in the tests
StaticObjectFetcher and CollectionFetcher (which will be removed)
are now broken (to fix the compilation errors)
- Remove TSC2 and TSC2V type constructors, use '(using)' clauses instead, which
forced to finalize the 'flavoured-connection' refactoring

- Change API for hasStatement and getStatements to be closer to RDF4J (with nulls)
to prevent needless creation of Option instances
(breakage occurred due to legit functionality changes)
@mirzov mirzov self-assigned this Jan 10, 2024
@mirzov mirzov merged commit 2700832 into master Jan 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants