Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DatasetIssues and Issue classes #2067

Merged
merged 23 commits into from
Aug 12, 2024

Conversation

rwblair
Copy link
Member

@rwblair rwblair commented Aug 6, 2024

Towards #2043

This PR flattens the structure of the DatasetIssues and Issue types. It adds a new subCode and rule field for further differentiation of issues. The issues of DatasetIssues are an array instead of as a map on . The nested files maps were removed in favor of creating a separate issue for each infraction file, code and all. With that there was no need to do any look ups of existing issues during validation.

addNonSchemaIssue was removed, its functionality combined with the regular add. Both took arrays of files as an argument, but in practice only one file was ever passed to it.

The old outputs would have these file objects with path, filenam, and relative path. It wasn't clear to me that those were really necessary. I tried to get away with just using the path to the file as a string for simplicity. But I may be missing something and having a proper File object in there would be better, when we have one.

I renamed reason to codeMessage since the text was always being drawn from where the code is defined be it schema or nonschema.
I renamed evidence to issueMessage since these were always generated at issue creation time during validation.
I don't know that either of these is actually clearer.

filter and groupBy have new DatasetIssues objects as part of their returns,
while get returns just an array of issues. should get return a new DatasetIssues?

You can chain calls to filter, is that useful? Chaining calls to groupBy isn't so easy, should it be easy? I had a version of groupBy that accepted an array of Issue keys to nest that would nest the groups of issues in a object for all the unique values of the keys in the order they were given, it was a mess so I removed it.

I had played with array of Issue keys being passed into the DatasetIssues constructor and used to set up a bunch of independent maps of values to be indexed on stored in this.groups (Map<keyof Issue, Map<Issue[keyof Issue]>, Issue[]>) but then I couldn't imagine it ever being used so I gave up on it.

Should DatasetIssues extend array? Push issues to this instead of this.issues. At its heart it is an array with some helper functions.

Currently issues store codeMessages which leads to quite a bit of duplication, they are consistently the longest field in an issue object and make this issues object 10x larger than old one. ThecodeMessages map in DatasetIssues could be used to keep a single copy of each message so they could be dropped from the Issue object, but then the issue object needs some assemblly when accessed.

If only we could store a reference to fields in codeMessages in issues.

@rwblair rwblair changed the title Refactor datasetIssues class Refactor DatasetIssues and Issue classes Aug 6, 2024
Copy link

codecov bot commented Aug 6, 2024

Codecov Report

Attention: Patch coverage is 70.35714% with 83 lines in your changes missing coverage. Please review.

Project coverage is 87.23%. Comparing base (941f36d) to head (ec4e1d9).

Files Patch % Lines
bids-validator/src/schema/applyRules.ts 50.48% 51 Missing ⚠️
bids-validator/src/validators/filenameValidate.ts 66.66% 14 Missing ⚠️
bids-validator/src/issues/datasetIssues.ts 93.81% 6 Missing ⚠️
bids-validator/src/schema/context.ts 0.00% 4 Missing ⚠️
...ds-validator/src/validators/internal/unusedFile.ts 0.00% 4 Missing ⚠️
bids-validator/src/tests/local/common.ts 33.33% 2 Missing ⚠️
bids-validator/src/schema/associations.ts 0.00% 1 Missing ⚠️
bids-validator/src/validators/hed.ts 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2067      +/-   ##
==========================================
+ Coverage   85.69%   87.23%   +1.53%     
==========================================
  Files          91      139      +48     
  Lines        3782     6806    +3024     
  Branches     1220     1604     +384     
==========================================
+ Hits         3241     5937    +2696     
- Misses        455      779     +324     
- Partials       86       90       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies effigies added the schema label Aug 7, 2024
@effigies
Copy link
Collaborator

@rwblair Just a note that I've opened https://github.com/rwblair/bids-validator/pull/10 against this. Issues discovered while attempting to address #2007, but the conflict with this PR was too high to be worth addressing separately.

@rwblair rwblair marked this pull request as ready for review August 12, 2024 15:45
…r message so we don't have to hunt for console logs.
Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped making codeMessage suggestions. Grabbing food, so submitting now.

Comment on lines 372 to 377
context.dataset.issues.add({
code: 'TSV_ADDITIONAL_COLUMNS_NOT_ALLOWED',
location: context.path,
issueMessage: `Disallowed columns found ${extraCols}`,
rule: schemaPath,
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more than I can do in a suggestion, due to diff, but what about:

for (const col of extraCols) {
  context.dataset.issues.add({
    code: 'TSV_ADDITIONAL_COLUMNS_NOT_ALLOWED',
    subCode: col,
    location: context.path,
    rule: schemaPath,
  })
}

Comment on lines 404 to 409
context.dataset.issues.add({
code: 'TSV_COLUMN_MISSING',
location: context.path,
issueMessage: `Columns cited as index columns not in file: ${missing}.`,
rule: schemaPath,
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think a foreach using subcodes would be good here.

rwblair and others added 12 commits August 12, 2024 12:18
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
@effigies effigies merged commit ec73f3a into bids-standard:master Aug 12, 2024
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants