-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DCJ-32] Load tags are locked and unlocked at the dataset level #1785
[DCJ-32] Load tags are locked and unlocked at the dataset level #1785
Conversation
Let flight1 have an active lock on loadTagA while ingesting files to datasetX. Previously, if flight2 attempted to lock loadTagA while ingesting files to datasetY (a different dataset), the attempt would fail. Now, such an attempt would succeed. Unchanged: if flight2 attempted to lock loadTagA while ingesting files to datasetX (the same dataset), the attempt will fail.
We can use the inputParameters class field instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many comments, but mostly minor things
- I would like to see the DAO test cleaned up a bit since I think teardown and some prep are not needed with the embedded database
- If
Load.locked
is unused, it would be nice to remove it
src/test/java/bio/terra/service/load/flight/LoadLockStepTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you 👍🏽
Notably, a shift in my understanding of "who's locking what": Flights lock a "load lock key" comprised of a load tag and a dataset ID. In other words, concurrent load operations (e.g. file ingests) to a dataset are not allowed to use the same load tag. In an effort to further reduce confusion, the `load` table is now named `load_lock` and has had its `locked` column remove. The presence of a row in the table denotes that a lock is active.
Not just for file ingests.
|
Jira ticket: https://broadworkbench.atlassian.net/browse/DCJ-32
Addresses
When files are being ingested to a dataset (via standalone file ingest or as part of a combined metadata-and-file ingest), TDR attempts to take out a lock on a load tag, either provided by the user or generated with the form
load-at-<timestamp>
. (This is in addition to the dataset-level lock taken out during the duration of the ingest.)A user may elect to provide and reuse their own load tag to logically link file loads together, but TDR will not allow a load tag to be used by multiple running file ingests: it confuses the algorithm for re-running a load with a load tag and skipping already-loaded files.
Summary of changes
Let
flight1
have an active lock onloadTag
while ingesting files to a dataset.flight2
attempts to lockloadTag
…Note that this PR is necessary for us to expand our unlockDataset endpoint to also clear any stuck locks on load tags. We recently had to modify the database directly to clear such stuck locks: https://broadworkbench.atlassian.net/browse/DCJ-625
Testing Strategy
Expanded unit tests (with drive-bys for JUnit 5 conversion, downgrading a Spring Boot test with embedded database to a pure unit test with mocking).