-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes #274
Various fixes #274
Conversation
This ensures that a process is removed even if it has other input/output commodities
@siddharth-krishna any chance you could take a look why the unit test is failing? It seems to be working fine locally... |
@@ -220,14 +220,14 @@ def compare( | |||
f" {sum(df.shape[0] for _, df in ground_truth.items())} rows" | |||
) | |||
|
|||
missing = set(ground_truth.keys()) - set(data.keys()) | |||
missing = set(ground_truth.keys()).difference(data.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious as to the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just to have the same approach to this everywhere. :-) I guess, there is no difference with respect to speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, on a big model, somehow I got an impression that
include.difference(exclude)
was noticeably faster thaninclude - exclude
. Could that be?
Interesting! It might be because you actually replace set(include) - set(exclude)
with set(include).difference(exclude)
and the latter simply iterates through exclude
while the former creates a set out of it, perhaps redundantly? Happy with your changes, was just curious :)
xl2times/transforms.py
Outdated
sets = dict() | ||
for action, regex_maker in map.items(): | ||
sets[action] = set( | ||
df.filter(regex=regex_maker(pattern), axis="index").iloc[:, 0].to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df.filter(regex=regex_maker(pattern), axis="index").iloc[:, 0].to_list() | |
df.filter(regex=regex_maker(pattern), axis="index").iloc[:, 0] |
if acc is None: | ||
return df | ||
return acc.merge(df) | ||
return sets["include"].difference(sets["exclude"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a nice trick to use a dict and a for-loop to replace repeated code, but I'm not sure it improves readability when the loop only executes twice? If there's a chance of adding more iterations to the loop in the future, the new version makes sense to me, otherwise I wonder if something like this is more understandable:
include = set(df.filter(regex=utils.create_regexp(pattern), axis="index").iloc[:, 0])
exclude = set(df.filter(regex=utils.create_negative_regexp(pattern), axis="index").iloc[:, 0])
return include - exclude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had it exactly that way, before switching to the current version. :-)
Yes, we may need to refactor it a few more times in the future, because the tables actually may include info on how to combine the filters. But that's for the future, since we don't have test cases for it anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, on a big model, somehow I got an impression that include.difference(exclude)
was noticeably faster than include - exclude
. Could that be?
xl2times/transforms.py
Outdated
return df.drop(exclude) | ||
|
||
def filter_by_pattern(df: DataFrame, pattern: str) -> set[str]: | ||
"""Filter dataframe index by a regex pattern. Return a set of corresponding items.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code also assumes that the items are in the first column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there should be only one column... Will update the description!
xl2times/utils.py
Outdated
# Substite VEDA wildcards with regex patterns; escape metacharacters. | ||
# ("_", ".") and ("[.]", "_") are meant to apply one after another to handle | ||
# the usage of "_" equivalent to "?" and "[_]" as literal "_". | ||
for substition in ((".", "\\."), ("_", "."), ("[.]", "_"), ("*", ".*"), ("?", ".")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I had #275 on my list but I didn't get around to it yet. Thanks for the fix, it's a smart one!
Btw, perhaps this is a good function to start writing unit tests for, as it's gotten reasonably complex/subtle?
Also, you can simplify this with for old, new in ...
:)
Also, I would use a list of tuples over a tuple of tuples when it gets beyond 3 items, because tuples have some strange hardcoded implementation in Python if I remember correctly. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see we have #222 already, I'll put it on my list :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll change it to a list before merging.
This PR makes various fixes, including: