-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review how missing data is handled in inversions #97
Comments
Quick update on this - if a month of fp data from the middle of the dataset is missing from the object store (e.g. you have April 2016 and June 2016, but you are missing May 2016) you get the confusing pymc error: If there is a month of fp data missing at the end of the dataset (e.g. MHD obs run to November 2023, but the MHD footprints only run to October 2023) you get a different error:
In this case there are no nans in fp_all, but the timeseries is truncated to the end of the fp data, which causes these subsequent issues. I think the first step would be to raise a more informative error when either of these cases occur, e.g. "Some observations could not be matched to footprints - check that all footprints are loaded in object store". Even better if it can tell you which footprints are missing! |
If the error happens during "add averaging error", then that might be because the obs data is loaded again from the object store for this part, so probably obs were dropped when the |
Yes I think that is what is happening. So maybe it would run if averaging error was turned off, but by default I think we probably want it to fail, or at the very least give us a strong warning that some of the obs were removed due to missing fps |
For now, you might just want to turn off "addaveragingerror", because when it loads the obs, it specifies an averaging period, so the averaging error of that data (over the same period) is zero... so I don't think it does anything. I haven't got around to fixing that, but there is an issue open #42 |
OK cool thanks! Although in my case I found the missing fps so I'm running fine now |
I'm having this issue as well @brendan-m-murphy : it looks like when I have times that have footprints/no obs, no problem but if there's obs/no footprints I get the same issue as joe ( Looks related to #96 with the logp -inf issue, as well |
It should be possible to fix this using 'dropna("time")' on the model scenarios before they're converted to numpy. I'll have a look Monday. |
If no footprint or obs data is found for a site, that site is dropped. This works well for monthly inversions, but less so for annual inversions.
@joe-pitt had some runs where NaNs due to missing footprint data were propegated to
H
and caused a confusing pymc error.Possible improvements could include:
Also, we should investigate what happens if some obs data is missing. Typically footprint data won't be missing, but we often have missing obs, and the runs work despite this.
The text was updated successfully, but these errors were encountered: