Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DC-1120]Update Snapshot Builder Settings to include Dataset Table Name and Relationships #1732

Merged
merged 12 commits into from
Jul 19, 2024
Merged
152 changes: 38 additions & 114 deletions src/main/java/bio/terra/service/snapshot/SnapshotService.java
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@
import bio.terra.model.SamPolicyModel;
import bio.terra.model.SnapshotAccessRequestResponse;
import bio.terra.model.SnapshotAccessRequestStatus;
import bio.terra.model.SnapshotBuilderFeatureValueGroup;
import bio.terra.model.SnapshotBuilderDatasetConceptSet;
import bio.terra.model.SnapshotBuilderOutputTable;
import bio.terra.model.SnapshotBuilderRootTable;
import bio.terra.model.SnapshotBuilderSettings;
import bio.terra.model.SnapshotBuilderTable;
import bio.terra.model.SnapshotIdsAndRolesModel;
Expand Down Expand Up @@ -694,30 +696,43 @@ public SnapshotAccessRequestResponse getSnapshotAccessRequestById(UUID accessReq
}

public AssetSpecification buildAssetFromSnapshotAccessRequest(
Dataset dataset, SnapshotAccessRequestResponse snapshotRequestModel) {
Dataset dataset, SnapshotAccessRequestResponse snapshotAccessRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be too much to rename the type as well, in the openapi spec? My thinking here is that the response from the API happens to be a SnapshotAccessRequest object, but not all SnapshotAccessRequest objects are actually a response from the API. Since we use the "response" object in non-response situations, it doesn't make sense for response to be part of the type name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, maybe there is a better name. We have a SnapshotAccessRequest object that represents the request submitted from the user, not something retrieved from the database, rather the input to the database. This is a subset of the data in the SnapshotAccessRequestResponse object which is always the object retrieved from the database, so in a way it kind of is a response. If we want to keep these types distinct, we could leave it as is or some up with a more precise name. If we don't want these types to be distinct, we could make the fields that don't exist on the SnapshotAccessRequest optional on the SnapshotAccessRequestResponse object and use that to represent both. But, I think I might prefer leaving them distinct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep the types separate. I do think we may want to separate the database type from the response type at some point; there may be some data that isn't in the database that we'd want to return, or vice versa. It also makes things confusing in that "request" sometimes means an API request and sometimes means an access request (which itself is part of an API request).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could swap it to SnapshotAccessRequestRequest and SnapshotAccessRequest? Because technically it is making a request for a SnapshotAccessRequest. But I also think it is okay to leave it as is, at least for now.

SnapshotBuilderSettings settings =
snapshotBuilderSettingsDao.getBySnapshotId(snapshotAccessRequest.getSourceSnapshotId());
// build asset model from snapshot request
SnapshotBuilderRootTable rootTable = settings.getRootTable();
AssetModel assetModel =
new AssetModel()
.name("snapshot-by-request-asset")
.rootTable("person")
.rootColumn("person_id");
// Manually add dictionary tables, leave columns empty to return all columns
assetModel.addTablesItem(new AssetTableModel().name("person"));
.rootTable(rootTable.getDatasetTableName())
.rootColumn(rootTable.getRootColumn());

// Add root table, leave columns empty to return all columns
assetModel.addTablesItem(new AssetTableModel().name(rootTable.getDatasetTableName()));

// Build asset tables and columns based on the concept sets included in the snapshot request
List<SnapshotBuilderTable> tables = pullTables(snapshotRequestModel);
List<SnapshotBuilderTable> tables = pullTables(snapshotAccessRequest, settings);
tables.forEach(
table -> {
assetModel.addTablesItem(
new AssetTableModel().name(table.getDatasetTableName()).columns(table.getColumns()));
// First add all person <-> occurrence relationships
assetModel.addFollowItem(table.getPrimaryTableRelationship());
if (table.getPrimaryTableRelationship() != null) {
assetModel.addFollowItem(table.getPrimaryTableRelationship());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the person table, table.getPrimaryTableRelationship() is null. This was causing an issue when demographics was selected because there was a null item in the follow list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now adding the root table separately, do we still need this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I think I could make the primary table relationship required on SnapshotBuilderTable now. But that means I can't use the allOf inheritance in the yaml file so it's a little less pretty but worth it because it will more accurately capture the types we want. Unless there's some way to say allOf ..except? Or I could create a base type and have them both inherit all of that, but since that base type wouldn't be used anywhere we probably don't want it in the yaml. So I'll just redefine SnapshotBuilderTable and SnapshotBuilderRootTable separately

}
});

// Second, add all occurrence <-> concept relationships
tables.forEach(
table -> table.getSecondaryTableRelationships().forEach(assetModel::addFollowItem));
table -> {
if (table.getSecondaryTableRelationships() != null) {
table.getSecondaryTableRelationships().forEach(assetModel::addFollowItem);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the null check here too because when we do include the sample table, it won't have secondary relationships which will cause the same issue I saw above for the person table.

}
});

assetModel.addTablesItem(new AssetTableModel().name("concept"));
// Add dictionary table, leave columns empty to return all columns
assetModel.addTablesItem(
new AssetTableModel().name(settings.getDictionaryTable().getDatasetTableName()));

// Make sure we just built a valid asset model
dataset.validateDatasetAssetSpecification(assetModel);
Expand All @@ -727,116 +742,25 @@ public AssetSpecification buildAssetFromSnapshotAccessRequest(
}

@VisibleForTesting
List<SnapshotBuilderTable> pullTables(SnapshotAccessRequestResponse snapshotRequestModel) {
var valueSets = snapshotRequestModel.getSnapshotSpecification().getValueSets();
var valueSetNames = valueSets.stream().map(SnapshotBuilderFeatureValueGroup::getName).toList();
List<SnapshotBuilderTable> pullTables(
SnapshotAccessRequestResponse snapshotAccessRequest, SnapshotBuilderSettings settings) {
List<String> includedTableNames =
snapshotAccessRequest.getSnapshotSpecification().getOutputTables().stream()
.map(SnapshotBuilderOutputTable::getName)
.toList();

Map<String, SnapshotBuilderTable> tableMap = populateManualTableMap();
List<SnapshotBuilderDatasetConceptSet> allTables = settings.getDatasetConceptSets();

Set<String> missing = new HashSet<>(valueSetNames);
missing.removeAll(tableMap.keySet());
Set<String> missing = new HashSet<>(includedTableNames);
allTables.stream().map(SnapshotBuilderDatasetConceptSet::getName).forEach(missing::remove);
if (!missing.isEmpty()) {
throw new IllegalArgumentException("Unknown value set names: " + missing);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What do you think about changing this error message now that we don't have value sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe unknown concept set name?

}

return valueSetNames.stream().map(tableMap::get).toList();
}

private Map<String, SnapshotBuilderTable> populateManualTableMap() {
// manual definition of domain names -> dataset table
Map<String, SnapshotBuilderTable> tableMap = new HashMap<>();
tableMap.put(
"Demographics",
new SnapshotBuilderTable()
.datasetTableName("person")
.secondaryTableRelationships(
List.of(
"fpk_person_gender_concept",
"fpk_person_race_concept",
"fpk_person_ethnicity_concept",
"fpk_person_gender_concept_s",
"fpk_person_race_concept_s",
"fpk_person_ethnicity_concept_s")));
tableMap.put(
"Drug",
new SnapshotBuilderTable()
.datasetTableName("drug_exposure")
.primaryTableRelationship("fpk_person_drug")
.secondaryTableRelationships(
List.of(
"fpk_drug_concept",
"fpk_drug_type_concept",
"fpk_drug_route_concept",
"fpk_drug_concept_s")));
tableMap.put(
"Measurement",
new SnapshotBuilderTable()
.datasetTableName("measurement")
.primaryTableRelationship("fpk_person_measurement")
.secondaryTableRelationships(
List.of(
"fpk_measurement_concept",
"fpk_measurement_unit",
"fpk_measurement_concept_s",
"fpk_measurement_value",
"fpk_measurement_type_concept",
"fpk_measurement_operator")));
tableMap.put(
"Visit",
new SnapshotBuilderTable()
.datasetTableName("visit_occurrence")
.primaryTableRelationship("fpk_person_visit")
.secondaryTableRelationships(
List.of(
"fpk_visit_preceding",
"fpk_visit_concept_s",
"fpk_visit_type_concept",
"fpk_visit_concept",
"fpk_visit_discharge")));
tableMap.put(
"Device",
new SnapshotBuilderTable()
.datasetTableName("device_exposure")
.primaryTableRelationship("fpk_person_device")
.secondaryTableRelationships(
List.of("fpk_device_concept", "fpk_device_concept_s", "fpk_device_type_concept")));
tableMap.put(
"Condition",
new SnapshotBuilderTable()
.datasetTableName("condition_occurrence")
.primaryTableRelationship("fpk_person_condition")
.secondaryTableRelationships(
List.of(
"fpk_condition_concept",
"fpk_condition_type_concept",
"fpk_condition_status_concept",
"fpk_condition_concept_s")));
tableMap.put(
"Procedure",
new SnapshotBuilderTable()
.datasetTableName("procedure_occurrence")
.primaryTableRelationship("fpk_person_procedure")
.secondaryTableRelationships(
List.of(
"fpk_procedure_concept",
"fpk_procedure_concept_s",
"fpk_procedure_type_concept",
"fpk_procedure_modifier")));
tableMap.put(
"Observation",
new SnapshotBuilderTable()
.datasetTableName("observation")
.primaryTableRelationship("fpk_person_observation")
.secondaryTableRelationships(
List.of(
"fpk_observation_concept",
"fpk_observation_concept_s",
"fpk_observation_unit",
"fpk_observation_qualifier",
"fpk_observation_type_concept",
"fpk_observation_value")));

return tableMap;
return allTables.stream()
.filter((table) -> includedTableNames.contains(table.getName()))
.map(SnapshotBuilderDatasetConceptSet::getTable)
.toList();
}

public static AssetSpecification getAssetByNameFromDataset(Dataset dataset, String assetName) {
Expand Down
42 changes: 24 additions & 18 deletions src/main/resources/api/data-repository-openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7207,6 +7207,7 @@ components:
- programDataOptions
- featureValueGroups
- datasetConceptSets
- rootTable
properties:
domainOptions:
type: array
Expand All @@ -7216,15 +7217,15 @@ components:
type: array
items:
$ref: '#/components/schemas/SnapshotBuilderProgramDataOption'
featureValueGroups:
type: array
items:
$ref: '#/components/schemas/SnapshotBuilderFeatureValueGroup'
datasetConceptSets:
type: array
items:
$ref:
'#/components/schemas/SnapshotBuilderDatasetConceptSet'
rootTable:
$ref: '#/components/schemas/SnapshotBuilderRootTable'
dictionaryTable:
$ref: '#/components/schemas/SnapshotBuilderTable'

SnapshotBuilderOption:
type: object
Expand Down Expand Up @@ -7308,14 +7309,11 @@ components:
⚠️ A concept set provided for the dataset, usually used to access tables that are not searchable as domains
required:
- name
- featureValueGroupName
properties:
name:
type: string
featureValueGroupName:
type: string
concept:
$ref: '#/components/schemas/SnapshotBuilderConcept'
table:
$ref: '#/components/schemas/SnapshotBuilderTable'

SnapshotBuilderDomainOption:
allOf:
Expand Down Expand Up @@ -7391,17 +7389,17 @@ components:
items:
$ref: '#/components/schemas/SnapshotBuilderParentConcept'

SnapshotBuilderFeatureValueGroup:
SnapshotBuilderOutputTable:
type: object
description: >
⚠️ Describes a set of columns for a table with given name
required:
- name
- values
- columns
properties:
name:
type: string
values:
columns:
type: array
items:
type: string
Expand All @@ -7427,6 +7425,18 @@ components:
items:
type: string

SnapshotBuilderRootTable:
allOf:
- $ref: '#/components/schemas/SnapshotBuilderTable'
- type: object
required:
- rootColumn
properties:
rootColumn:
type: string
description: >
The primary column in the root table that is used to join with other tables

EnumerateSnapshotAccessRequest:
type: object
description: The model for enumerated SnapshotAccessRequests.
Expand Down Expand Up @@ -7510,14 +7520,10 @@ components:
type: array
items:
$ref: '#/components/schemas/SnapshotBuilderCohort'
conceptSets:
type: array
items:
$ref: '#/components/schemas/SnapshotBuilderDatasetConceptSet'
valueSets:
outputTables:
type: array
items:
$ref: '#/components/schemas/SnapshotBuilderFeatureValueGroup'
$ref: '#/components/schemas/SnapshotBuilderOutputTable'
description: Specification of cohort and participant data for a requested snapshot.

SnapshotBuilderCohort:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
import bio.terra.model.SnapshotBuilderCountResponseResult;
import bio.terra.model.SnapshotBuilderGetConceptHierarchyResponse;
import bio.terra.model.SnapshotBuilderParentConcept;
import bio.terra.model.SnapshotBuilderSettings;
import bio.terra.model.SnapshotModel;
import bio.terra.model.SnapshotPreviewModel;
import bio.terra.model.SnapshotRequestModel;
Expand Down Expand Up @@ -497,7 +496,7 @@ void getSnapshotSnapshotBuilderSettingsNotFound() throws Exception {
@Test
void updateSnapshotSnapshotBuilderSettings() throws Exception {
mockValidators();
var snapshotBuilderSettings = new SnapshotBuilderSettings();
var snapshotBuilderSettings = SnapshotBuilderTestData.SETTINGS;
mvc.perform(
put(SNAPSHOT_BUILDER_SETTINGS_ENDPOINT, SNAPSHOT_ID)
.contentType(MediaType.APPLICATION_JSON)
Expand All @@ -516,7 +515,7 @@ void updateSnapshotSnapshotBuilderSettingsForbidden() throws Exception {
mvc.perform(
put(SNAPSHOT_BUILDER_SETTINGS_ENDPOINT, SNAPSHOT_ID)
.contentType(MediaType.APPLICATION_JSON)
.content("{}"))
.content(TestUtils.mapToJson(SnapshotBuilderTestData.SETTINGS)))
.andExpect(status().isForbidden());

verifyAuthorizationCall(iamAction);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
import bio.terra.model.SamPolicyModel;
import bio.terra.model.SnapshotAccessRequestResponse;
import bio.terra.model.SnapshotAccessRequestStatus;
import bio.terra.model.SnapshotBuilderFeatureValueGroup;
import bio.terra.model.SnapshotBuilderOutputTable;
import bio.terra.model.SnapshotBuilderRequest;
import bio.terra.model.SnapshotIdsAndRolesModel;
import bio.terra.model.SnapshotLinkDuosDatasetResponse;
Expand Down Expand Up @@ -160,6 +160,7 @@ class SnapshotServiceTest {
@Mock private EcmService ecmService;
@Mock private RawlsService rawlsService;
@Mock private DuosClient duosClient;
@Mock private SnapshotBuilderSettingsDao settingsDao;
private final UUID snapshotId = UUID.randomUUID();
private final UUID datasetId = UUID.randomUUID();
private final UUID snapshotTableId = UUID.randomUUID();
Expand Down Expand Up @@ -187,7 +188,7 @@ void beforeEach() {
azureSynapsePdao,
rawlsService,
duosClient,
mock(SnapshotBuilderSettingsDao.class));
settingsDao);
}

@Test
Expand Down Expand Up @@ -1113,7 +1114,8 @@ void testCreateSnapshotWithByRequestId() {
new SnapshotAccessRequestResponse()
.status(SnapshotAccessRequestStatus.APPROVED)
.createdBy("email@a.com")
.id(snapshotAccessRequestId);
.id(snapshotAccessRequestId)
.sourceSnapshotId(UUID.randomUUID());
when(snapshotRequestDao.getById(snapshotAccessRequestId))
.thenReturn(snapshotAccessRequestResponse);
when(snapshotDao.retrieveSnapshot(snapshotAccessRequestResponse.getSourceSnapshotId()))
Expand Down Expand Up @@ -1352,14 +1354,17 @@ void makeSnapshotFromSnapshotRequestByRequestId() {

Dataset dataset = SnapshotBuilderTestData.DATASET;
dataset.name("datasetName");
UUID sourceSnapshotId = UUID.randomUUID();
when(snapshotRequestDao.getById(snapshotAccessRequestId))
.thenReturn(
new SnapshotAccessRequestResponse()
.sourceSnapshotId(sourceSnapshotId)
.snapshotSpecification(
new SnapshotBuilderRequest()
.addValueSetsItem(new SnapshotBuilderFeatureValueGroup().name("Drug"))
.addValueSetsItem(
new SnapshotBuilderFeatureValueGroup().name("Condition"))));
.addOutputTablesItem(new SnapshotBuilderOutputTable().name("Drug"))
.addOutputTablesItem(new SnapshotBuilderOutputTable().name("Condition"))));
when(settingsDao.getBySnapshotId(sourceSnapshotId))
.thenReturn(SnapshotBuilderTestData.SETTINGS);

Snapshot actual = service.makeSnapshotFromSnapshotRequest(snapshotRequestModel, dataset);
SnapshotSource snapshotSource = new SnapshotSource().dataset(dataset);
Expand Down Expand Up @@ -1473,7 +1478,8 @@ void pullTables() {
var accessRequestResponse =
SnapshotBuilderTestData.createSnapshotAccessRequestResponse(sourceSnapshotId);
accessRequestResponse.id(snapshotAccessRequestId);
var firstTable = service.pullTables(accessRequestResponse).get(0);
var firstTable =
service.pullTables(accessRequestResponse, SnapshotBuilderTestData.SETTINGS).get(0);
assertThat(firstTable.getDatasetTableName(), is("drug_exposure"));
// Must preserve relationship order
assertThat(firstTable.getPrimaryTableRelationship(), equalTo("fpk_person_drug"));
Expand All @@ -1497,6 +1503,7 @@ void buildAssetFromSnapshotAccessRequest() {
var accessRequestResponse =
SnapshotBuilderTestData.createSnapshotAccessRequestResponse(snapshotId);
accessRequestResponse.id(snapshotAccessRequestId);
when(settingsDao.getBySnapshotId(snapshotId)).thenReturn(SnapshotBuilderTestData.SETTINGS);

var actualAssetSpec =
service.buildAssetFromSnapshotAccessRequest(sourceDataset, accessRequestResponse);
Expand Down
Loading
Loading