Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DC-1120]Update Snapshot Builder Settings to include Dataset Table Name and Relationships #1732

Merged
merged 12 commits into from
Jul 19, 2024

Conversation

rjohanek
Copy link
Contributor

@rjohanek rjohanek commented Jul 9, 2024

Jira ticket: https://broadworkbench.atlassian.net/browse/1120

Summary of Changes

  1. Update snapshot builder settings
  • Remove "featureValueGroups"
  • Rework datasetConceptSets
  • Remove "featureValueGroupName"
  • Add “table” object that follows the model of the “SnapshotBuilderTable” that includes datasetTableName, columns and relationships
  • We’ll leave columns empty for now so that all columns are included in the asset
  1. Rework the Snapshot Builder Access Request
  • Remove "conceptSets"
  • Rename "valueSets" to "outputTables" and the “values” list to instead be “columns”
  • Return an empty list for “columns” for now.
  1. Update how the asset is generated on snapshot create by request id
  • Remove the manual mapping between the concept name and the table object
  • Instead, pull the table object from snapshot builder settings based on the name of the outputTable/ConceptSet

There will be a follow up UI change to handle changes to settings/access request - Not yet complete

Testing Strategy

  1. Updated existing snapshotServiceTests
  2. Integration and connected tests as they are should test this already since this is not adding functionality but replacing hardcoded functionality

@rjohanek rjohanek requested review from a team as code owners July 9, 2024 19:15
@rjohanek rjohanek requested review from fboulnois and aarohinadkarni and removed request for a team July 9, 2024 19:15
public final class Observation {
private Observation() {}

public static final String TABLE_NAME = "observation";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it really matters because it is test data but I noticed there is no observation occurrence table, just observation table in OMOP so I renamed this class

.table(new SnapshotBuilderTable().datasetTableName(Observation.TABLE_NAME)),
new SnapshotBuilderDatasetConceptSet()
.name("Measurement")
.table(new SnapshotBuilderTable().datasetTableName(Measurement.TABLE_NAME)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: I could make a constant like I did for Measurement and use the table name like shown here, or just use the string table names like shown below (because they are only used once), or I could just define constants in this class

@@ -688,7 +689,7 @@ public SnapshotAccessRequestResponse getSnapshotAccessRequestById(UUID accessReq
}

public AssetSpecification buildAssetFromSnapshotAccessRequest(
Dataset dataset, SnapshotAccessRequestResponse snapshotRequestModel) {
Dataset dataset, SnapshotAccessRequestResponse snapshotAccessRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be too much to rename the type as well, in the openapi spec? My thinking here is that the response from the API happens to be a SnapshotAccessRequest object, but not all SnapshotAccessRequest objects are actually a response from the API. Since we use the "response" object in non-response situations, it doesn't make sense for response to be part of the type name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, maybe there is a better name. We have a SnapshotAccessRequest object that represents the request submitted from the user, not something retrieved from the database, rather the input to the database. This is a subset of the data in the SnapshotAccessRequestResponse object which is always the object retrieved from the database, so in a way it kind of is a response. If we want to keep these types distinct, we could leave it as is or some up with a more precise name. If we don't want these types to be distinct, we could make the fields that don't exist on the SnapshotAccessRequest optional on the SnapshotAccessRequestResponse object and use that to represent both. But, I think I might prefer leaving them distinct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep the types separate. I do think we may want to separate the database type from the response type at some point; there may be some data that isn't in the database that we'd want to return, or vice versa. It also makes things confusing in that "request" sometimes means an API request and sometimes means an access request (which itself is part of an API request).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could swap it to SnapshotAccessRequestRequest and SnapshotAccessRequest? Because technically it is making a request for a SnapshotAccessRequest. But I also think it is okay to leave it as is, at least for now.

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me.

@@ -688,7 +689,7 @@ public SnapshotAccessRequestResponse getSnapshotAccessRequestById(UUID accessReq
}

public AssetSpecification buildAssetFromSnapshotAccessRequest(
Dataset dataset, SnapshotAccessRequestResponse snapshotRequestModel) {
Dataset dataset, SnapshotAccessRequestResponse snapshotAccessRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep the types separate. I do think we may want to separate the database type from the response type at some point; there may be some data that isn't in the database that we'd want to return, or vice versa. It also makes things confusing in that "request" sometimes means an API request and sometimes means an access request (which itself is part of an API request).

Comment on lines +179 to +181
"primaryTableRelationship": "fpk_person_drug",
"secondaryTableRelationships": [
"fpk_drug_type_concept",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should be leveraging TDR's relationship objects here instead of storing fpk names. (So instead of storing the primary table fpk, we'd store the field of the table that contains the value which has a fpk relationship to the primary table.) But that change can be done later, since this PR is about changing where this data is stored, not how it's represented.

# Conflicts:
#	src/test/java/bio/terra/service/snapshot/SnapshotServiceTest.java
@fboulnois fboulnois removed their request for review July 15, 2024 13:59
tables.forEach(
table -> {
assetModel.addTablesItem(
new AssetTableModel().name(table.getDatasetTableName()).columns(table.getColumns()));
// First add all person <-> occurrence relationships
assetModel.addFollowItem(table.getPrimaryTableRelationship());
if (table.getPrimaryTableRelationship() != null) {
assetModel.addFollowItem(table.getPrimaryTableRelationship());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the person table, table.getPrimaryTableRelationship() is null. This was causing an issue when demographics was selected because there was a null item in the follow list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now adding the root table separately, do we still need this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I think I could make the primary table relationship required on SnapshotBuilderTable now. But that means I can't use the allOf inheritance in the yaml file so it's a little less pretty but worth it because it will more accurately capture the types we want. Unless there's some way to say allOf ..except? Or I could create a base type and have them both inherit all of that, but since that base type wouldn't be used anywhere we probably don't want it in the yaml. So I'll just redefine SnapshotBuilderTable and SnapshotBuilderRootTable separately

table -> table.getSecondaryTableRelationships().forEach(assetModel::addFollowItem));
table -> {
if (table.getSecondaryTableRelationships() != null) {
table.getSecondaryTableRelationships().forEach(assetModel::addFollowItem);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the null check here too because when we do include the sample table, it won't have secondary relationships which will cause the same issue I saw above for the person table.

Copy link

Copy link
Contributor

@s-rubenstein s-rubenstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a few comments.

@@ -688,7 +689,7 @@ public SnapshotAccessRequestResponse getSnapshotAccessRequestById(UUID accessReq
}

public AssetSpecification buildAssetFromSnapshotAccessRequest(
Dataset dataset, SnapshotAccessRequestResponse snapshotRequestModel) {
Dataset dataset, SnapshotAccessRequestResponse snapshotAccessRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could swap it to SnapshotAccessRequestRequest and SnapshotAccessRequest? Because technically it is making a request for a SnapshotAccessRequest. But I also think it is okay to leave it as is, at least for now.

tables.forEach(
table -> {
assetModel.addTablesItem(
new AssetTableModel().name(table.getDatasetTableName()).columns(table.getColumns()));
// First add all person <-> occurrence relationships
assetModel.addFollowItem(table.getPrimaryTableRelationship());
if (table.getPrimaryTableRelationship() != null) {
assetModel.addFollowItem(table.getPrimaryTableRelationship());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now adding the root table separately, do we still need this check?

Set<String> missing = new HashSet<>(valueSetNames);
missing.removeAll(tableMap.keySet());
Set<String> missing = new HashSet<>(includedTableNames);
allTables.stream().map(SnapshotBuilderDatasetConceptSet::getName).forEach(missing::remove);
if (!missing.isEmpty()) {
throw new IllegalArgumentException("Unknown value set names: " + missing);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What do you think about changing this error message now that we don't have value sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe unknown concept set name?

.name("Procedure")
.table(
new SnapshotBuilderTable()
.datasetTableName(ProcedureOccurrence.TABLE_NAME)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow SnapshotBuilderTable objects to have no primaryTableRelationships or secondaryTableRelationships?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't populate it here because it isn't used for tests, but based on your other comment, I think we could make primary table relationship required, so then I'd need to populate that here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for other non-omop datasets that we wouldn't want to make this required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - its tricky to represent the ideal state so we're just leaving them optional for now.

@rjohanek rjohanek merged commit 6ef160b into develop Jul 19, 2024
13 checks passed
@rjohanek rjohanek deleted the rj/dc-1120-update-settings branch July 19, 2024 14:32
Shakespeared pushed a commit that referenced this pull request Jul 23, 2024
…me and Relationships (#1732)

Co-authored-by: rjohanek <rjohanek@broadinstitute.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants