Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce TFM_INS-AT and TFM_INS-TS to TFM_INS #122

Merged
merged 8 commits into from
Oct 6, 2023
Merged

Conversation

siddharth-krishna
Copy link
Collaborator

Part 1 of #120

@Antti-L
Copy link

Antti-L commented Sep 29, 2023

I looked at some of the files, and there I saw the following assumption being used:

A column name is a year if it is an int between 1001 and 4000

This is not sufficient for data columns, because any time series attribute can also be defined for the '0' Year index (which is not a real year, but the IE option placeholder, which must be supported as a Year data column header).

In addition, I wonder where the 1001 comes from; the year 1000 is supported by VEDA and TIMES as well. 😁

Copy link
Collaborator Author

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Antti-L . That code you saw was my first attempt at handling TFM_INS-TS, but I discovered that the code is already handling it (see my comments), so I removed that code. Looks like the existing code assumes all unknown columns are years -- is that sufficient?

if table.tag == datatypes.Tag.tfm_ins_ts:
# ~TFM_INS-TS: Regions should be specified in a column with header=Region and columns in data area are YEARS
# ~TFM_INS-TS: columns in data area are YEARS
data_columns = [
colname
for colname in df.columns
if colname not in known_columns | {"region", "ts_filter"}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here assumes all columns that are not known_colums or {"region", "ts_filter"} are years

if table.tag == datatypes.Tag.tfm_ins_ts:
# ~TFM_INS-TS: Regions should be specified in a column with header=Region and columns in data area are YEARS
# ~TFM_INS-TS: columns in data area are YEARS
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we already have code to handle TFM_INS-TS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. :-) The idea for #120 is to handle the variants before proceeding with other transforms. I.e. this way we would reduce the number of elif statetements and improve readability of the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I was coming to the same conclusion myself! Let me see if I can refactor this code to make it simpler.

@Antti-L
Copy link

Antti-L commented Sep 29, 2023

The code here assumes all columns that are not known_colums or {"region", "ts_filter"} are years

Ok fine with that basically, but I am not sure the known_columns include all known columns. For example, Top_check can be used to enforce a check that the (process,commodity) pair is indeed defined in the process topology (which is not done by default by VEDA, and so it probably should not be done here either). Is Top_check one of your known colmns for TFM_INS / TFM_INS-AT /TFM_INS-TS ? I did notice it under the UC_T transforms, but could not see it not under TFM_INS... Note that under UC_T, topology check IS done by default.

@olejandro
Copy link
Member

Thanks @Antti-L. Probably not yet. This file includes an overview of columns names that will be supported.

@@ -1768,8 +1772,7 @@ def has_no_wildcards(list):
and set(df.columns) & query_columns == {"cset_cn"}
and has_no_wildcards(df["cset_cn"])
):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olejandro do you know if this if condition and the elif directly below can be replaced by something more general? Perhaps this is something we also do to other TFM_INS variants?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, @siddharth-krishna I believe this part can just be dropped if we do reduction earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, on generalising the code: we'll need to do this kind of reduction also for tfm_upd and tfm_dins

@siddharth-krishna
Copy link
Collaborator Author

Thanks for the review. I'll refactor this a bit so don't merge it yet.

@olejandro
Copy link
Member

No problem. I thought I'd leave that decision up to you. :-)

@siddharth-krishna
Copy link
Collaborator Author

I fixed the regressions! This PR now reduces TFM_INS-TS and TFM_INS-AT tables to TFM_INS tables.

The little shortcut hack that converts some TFM_INS-TS tables to FI_T is still there because removing it leads to more regressions, so I've left removing it to a future PR.

The new code assumes all columns whose names are integers >= 0 are years in TFM_INS-TS tables. Let me know if that assumption is false.

@siddharth-krishna siddharth-krishna changed the title Handle TFM_INS-AT Reduce TFM_INS-AT and TFM_INS-TS to TFM_INS Oct 6, 2023
Copy link
Member

@olejandro olejandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good.

@olejandro olejandro merged commit e54e518 into main Oct 6, 2023
@olejandro olejandro deleted the sidk/tfm-ins-at branch October 6, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants