-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce TFM_INS-AT and TFM_INS-TS to TFM_INS #122
Conversation
I looked at some of the files, and there I saw the following assumption being used:
This is not sufficient for data columns, because any time series attribute can also be defined for the '0' Year index (which is not a real year, but the IE option placeholder, which must be supported as a Year data column header). In addition, I wonder where the 1001 comes from; the year 1000 is supported by VEDA and TIMES as well. 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Antti-L . That code you saw was my first attempt at handling TFM_INS-TS
, but I discovered that the code is already handling it (see my comments), so I removed that code. Looks like the existing code assumes all unknown columns are years -- is that sufficient?
times_reader/transforms.py
Outdated
if table.tag == datatypes.Tag.tfm_ins_ts: | ||
# ~TFM_INS-TS: Regions should be specified in a column with header=Region and columns in data area are YEARS | ||
# ~TFM_INS-TS: columns in data area are YEARS | ||
data_columns = [ | ||
colname | ||
for colname in df.columns | ||
if colname not in known_columns | {"region", "ts_filter"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here assumes all columns that are not known_colums
or {"region", "ts_filter"}
are years
times_reader/transforms.py
Outdated
if table.tag == datatypes.Tag.tfm_ins_ts: | ||
# ~TFM_INS-TS: Regions should be specified in a column with header=Region and columns in data area are YEARS | ||
# ~TFM_INS-TS: columns in data area are YEARS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we already have code to handle TFM_INS-TS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. :-) The idea for #120 is to handle the variants before proceeding with other transforms. I.e. this way we would reduce the number of elif
statetements and improve readability of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I was coming to the same conclusion myself! Let me see if I can refactor this code to make it simpler.
Ok fine with that basically, but I am not sure the known_columns include all known columns. For example, |
times_reader/transforms.py
Outdated
@@ -1768,8 +1772,7 @@ def has_no_wildcards(list): | |||
and set(df.columns) & query_columns == {"cset_cn"} | |||
and has_no_wildcards(df["cset_cn"]) | |||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olejandro do you know if this if
condition and the elif
directly below can be replaced by something more general? Perhaps this is something we also do to other TFM_INS
variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, @siddharth-krishna I believe this part can just be dropped if we do reduction earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, on generalising the code: we'll need to do this kind of reduction also for tfm_upd
and tfm_dins
Thanks for the review. I'll refactor this a bit so don't merge it yet. |
No problem. I thought I'd leave that decision up to you. :-) |
I fixed the regressions! This PR now reduces TFM_INS-TS and TFM_INS-AT tables to TFM_INS tables. The little shortcut hack that converts some TFM_INS-TS tables to FI_T is still there because removing it leads to more regressions, so I've left removing it to a future PR. The new code assumes all columns whose names are integers >= 0 are years in TFM_INS-TS tables. Let me know if that assumption is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good.
Part 1 of #120