-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH support types with a safe __reduce__ #467
base: main
Are you sure you want to change the base?
Conversation
WDYT @BenjaminBossan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. I have a few comments, but up to you whether you think they're worth addressing.
Also a more general observation: It feels like each new PR with code changes requires updating pixi stuff. Is this just coincidence, a transition thing that'll go away, or the "new normal"?
skops/io/_general.py
Outdated
# note that we do "=="" to compare types instead of "is", since we only accept | ||
# exact matches here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. Checking reduce_output[0] == type(obj)
is fine but when would using is
not be fine too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow thought is
applies to subclasses, but it doesn't. So changing to is
skops/io/_general.py
Outdated
# exact matches here. | ||
if len(reduce_output) == 2 and reduce_output[0] == type(obj): | ||
return { | ||
"__class__": obj.__class__.__name__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, should we use reduce_output[0].__name__
or type(obj).__name__
here (i.e. one of the two objects used above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using type(obj).__name__
now
super().__init__(state, load_context, trusted) | ||
self.children = { | ||
"content": get_tree(state["content"], load_context, trusted=trusted) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I'm a bit rusty on this, do we need to set self.trusted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pass trusted here if the user passes them, and sometimes we'd add to the trusted values if we have some default trusted types.
obj = slice(1, 2, 3) | ||
loaded_obj = loads(dumps(obj)) | ||
assert obj == loaded_obj | ||
assert type(obj) is slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we construct an example object that uses this __reduce__
pattern but is not a trusted type, so that we can ensure that loading this object fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the datetime
example above fails since it's not trusted, but I can also add another test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pixi lockfile shouldn't change much under normal circumstances. Here it changes a lot since I managed to add an environment and support an older version of sklearn.
skops/io/_general.py
Outdated
# note that we do "=="" to compare types instead of "is", since we only accept | ||
# exact matches here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow thought is
applies to subclasses, but it doesn't. So changing to is
skops/io/_general.py
Outdated
# exact matches here. | ||
if len(reduce_output) == 2 and reduce_output[0] == type(obj): | ||
return { | ||
"__class__": obj.__class__.__name__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using type(obj).__name__
now
super().__init__(state, load_context, trusted) | ||
self.children = { | ||
"content": get_tree(state["content"], load_context, trusted=trusted) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pass trusted here if the user passes them, and sometimes we'd add to the trusted values if we have some default trusted types.
obj = slice(1, 2, 3) | ||
loaded_obj = loads(dumps(obj)) | ||
assert obj == loaded_obj | ||
assert type(obj) is slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the datetime
example above fails since it's not trusted, but I can also add another test.
Fixes #464
Sometimes the output of
__reduce__
is of the form(type, (constructor_args,)
wheretype
is the same as thetype(obj)
. In such cases, we can consider loading them safe. We still fail if the given type is not trusted, but there's no danger of loading those objects since we want to load them anyway. We would be calling their__new__
and__setstate__
otherwise.This adds supports to many other types such as
datetime
andslice
.