-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce inconsistent fit closure test data #2180
Conversation
36371a3
to
809dd6d
Compare
I would need to see chi2 / distances / etc. By running in parallel the seeding might happen differently at some stage but the initialization (if everything goes well) should be equivalent. By eye they look too different to me, but if all metrics are very close it could well be a random fluctuation. |
Looking at the plots I share @scarlehoff's impression, but indeed a full compare report including chi2s and distances would be needed to say it more conclusively. From Stefano's comments during the PC I have the impression that the paper is about to be finished, so I think this PR deserves some priority. It would be very good to have this merged before the paper is on arxiv, because otherwise I'm afraid it may never happen. |
@scarlehoff , @RoyStegeman , and @giovannidecrescenzo I have tested this module by running a multiclosure test for the DIS bulk inconsistency scenario with an inconsistency parameter lambda = 0. Because of this I think that this branch is ready to be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I've left a few comments, mostly related to reducing the reach of the changes inside validphys (I think the more funcionality is shared the better it will be for future maintainability).
Other than that it looks good I think.
validphys2/src/validphys/filters.py
Outdated
inconsistent_data_settings, | ||
): | ||
""" | ||
Same as _filter_closure_data, but for inconsistent closure tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add when / what makes the difference with respect to _filter_closure_data.
I think that, since this an internal function (where all the funcionality is) and not an action, it would be greatly benefitial to have the functionality inside _filter_closure_data
with a switch (it is probably enough to check whether inconsistent_data_settings
is None.
Something like adding:
if inconsistent_data_settings is not None:
# Modify the closure data adding inconsistencies.
closure_data = [
InconsistentCommonData(
setname=cd.setname,
ndata=cd.ndata,
commondataproc=cd.commondataproc,
nkin=cd.nkin,
nsys=cd.nsys,
commondata_table=cd.commondata_table,
systype_table=cd.systype_table,
)
for cd in closure_data
]
directly to _filter_closure_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely agree on this. the function is internal and used as an explicit node by the config module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the one used as a node is filter_inconsistent_closure_data_by_experiment
, I mean _filter_closure_data
and _filter_inconsistent_closure_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry meant that one.
What I mean is that I don't understand why you want
filter_inconsistent_closure_data_by_experiment
to be calling the same internal function as
filter_closure_data_by_experiment
they are two separate things and this only adds one extra function (_filter_inconsistent_closure_data
)
to filters.
The advantage, in my opinion, is that _filter_closure_data
does not become to large as a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if anything changes in how _filter_closure_data
behaves now you have to change two functions instead of one.
Or worse, someone down the line will change something in _filter_closure_data
and will forget that the inconsistent one has to be changed in the same way and both functions will drift away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to say I agree with this. _filtere_inconsistent_closure_data is an extension of _filter_closure_data that still seems to make use of all of the original functionality so making it two separate functions is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think there are at least two ways of doing this
I can either leave two separate nodes filter_inconsistent_closure_data_by_experiment
and filter_closure_data_by_experiment
and let the explicit node deal with them, or reduce even those.
I am actually favourable to having the two separate nodes and a rule in config that depends on whether inconsistent_fakedata
has been set True.
This would avoid running inconsistent closure tests in the case in which one has forgotten inconsistent_data_settings in the runcard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had quick comment: as of now we are disregarding the option to generate lvl1 data "inconsistently" in the sense that we do not have anymore the option to vary the generation of lvl 1 data, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is what you refer here what we used to call type 2 inconsistency?
If so, we never used this and it is not in the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah exactly, just to be sure
validphys2/src/validphys/closuretest/inconsistent_closuretest/inconsistent_ct.py
Outdated
Show resolved
Hide resolved
…s it is done for the central values
809dd6d
to
2072d85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I don't have much to add to JCM's comments.
Please add documentation
validphys2/src/validphys/filters.py
Outdated
inconsistent_data_settings, | ||
): | ||
""" | ||
Same as _filter_closure_data, but for inconsistent closure tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to say I agree with this. _filtere_inconsistent_closure_data is an extension of _filter_closure_data that still seems to make use of all of the original functionality so making it two separate functions is not ideal.
@scarlehoff, @RoyStegeman If you are happy with the current state of this PR, I would suggest merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It might be nice to refer to the paper once it is out in the docstr for context.
Just a last nitpic, did you use pre-commit? I think isort
would've picked up some of the issues (you can just touch the files, then git add them and do precommit, otherwise just run isort on all of them -also the test-).
Thanks for all the comments, they should all have been addressed |
Reimplements #1682
This pull request introduces a new class,
InconsistentCommonData
, to handle inconsistencies within closure tests and includes several related changes across multiple files. The most important changes include the addition of the new class, updates to the configuration parsing, and new filtering methods for handling inconsistent closure data.New Class and Methods:
validphys2/src/validphys/closuretest/inconsistent_closuretest/inconsistent_ct.py
: AddedInconsistentCommonData
class with methods to introduce inconsistencies in closure tests, includingsystematic_errors
property,select_systype_table_indices
,rescale_systematics
,process_commondata
, andexport_uncertainties
.Configuration Updates:
validphys2/src/validphys/config.py
: Addedparse_inconsistent_data_settings
method to parse inconsistent data settings from the YAML file.validphys2/src/validphys/config.py
: Updatedproduce_filter_data
to includeinconsistent_fakedata
parameter for filtering inconsistent closure data. [1] [2]Filtering Methods:
validphys2/src/validphys/filters.py
: Addedfilter_inconsistent_closure_data_by_experiment
and_filter_inconsistent_closure_data
methods to handle filtering of inconsistent closure data. [1] [2]Testing:
validphys2/src/validphys/tests/test_inconsistent_ct.py
: Added unit tests forInconsistentCommonData
class methods, including tests for the getter and setter ofsystematic_errors
,select_systype_table_indices
,rescale_systematics
,process_commondata
, andexport_uncertainties
.Benchmarking of Code
Test 0.0
Test that an inconsistent ct fit (using the inconsistent closure test filter) with lambda = 1.0 gives the same results as a consistent closure test:
https://vp.nnpdf.science/VIdN_z0ETx6Ph0szzKLY0g==
Multiclosure Test
DIS multiclosure test with maximal lambda inconsistency parameter (lambda=0)
https://vp.nnpdf.science/idXftjIBTp6r2ufbFIhMaA==
Also note that for a single fit with the same L1 data
This is the type of agreement found between old and new branch for a DIS only inconsistent closure test with lambda = 0
https://vp.nnpdf.science/sH-C_csbTGCPCdEGFXj_1w==
Example of inconsistent closure test runcard: