Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: decorators should wrap functions with vanilla wrapper #1282

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Feb 26, 2025

This is a prototype sketch.

  1. each decorator wraps the function -- and returns that pointer.
  2. we only wrap a function 1 level deep.
  3. we validate on the original function.

This should then enable:

  1. reusing base functions easily.
  2. not mutating the original function.
  3. also work in SERDE contexts -- but something to properly test.

Changes

  • all decorators now modify a wrapper
  • we special case mutate (somewhat hacky)
  • adds tests
  • fixes parameterize to work with async
  • fixes data validators to work with async

How I tested this

  • unit tests
  • test with parallel & collect for distributed stuff (seems to work)
  • run main examples
  • Test export
  • Validate "originating function" is as intended
  • Ask hamilton library people to test this change with their stuff

Notes

  • needs some more tests
  • regression found with ray + hamilton sdk when using parallel & collect.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

This is a prototype sketch.

1. each decorator wraps the function -- and returns that pointer.
2. we only wrap a function 1 level deep.
3. we validate on the original function.

This should then enable:
1. reusing base functions easily.
2. not mutating the original function.
3. also work in SERDE contexts -- but something to properly test.
@elijahbenizzy
Copy link
Collaborator

elijahbenizzy commented Feb 26, 2025

Copying the function might be cleaner -- if this works then great but I think there are a few assumptions that you'll have to check for (async, signature stuff), which functools.wraps doesn't cover.

Might be easier to use this utility function:

def _copy_func(f):

@skrawcz
Copy link
Collaborator Author

skrawcz commented Feb 26, 2025

Copying the function might be cleaner -- if this works then great but I think there are a few assumptions that you'll have to check for (async, signature stuff), which functools.wraps doesn't cover.

Might be easier to use this utility function:

def _copy_func(f):

yep some testing to be done for sure.

We need to audit all decorators and see if they work with async or not...
@skrawcz skrawcz force-pushed the prototype_decorator_change branch from 2b655fb to d2052b1 Compare February 27, 2025 00:36
@skrawcz
Copy link
Collaborator Author

skrawcz commented Feb 27, 2025

@jernejfrank any thoughts on what assumption I broke with @mutate?

TODO: properly fix this that doesn't leak stuff
@skrawcz
Copy link
Collaborator Author

skrawcz commented Feb 27, 2025

@jernejfrank any thoughts on what assumption I broke with @mutate?

figured it out. I have a hack that works I think. will need to refactor a few things in this PR for a proper change

@jernejfrank
Copy link
Contributor

jernejfrank commented Feb 27, 2025

@jernejfrank any thoughts on what assumption I broke with @mutate?

figured it out. I have a hack that works I think. will need to refactor a few things in this PR for a proper change

Great, just to write down why I think it failed for easier lookup:

mutate looks at the functions you want to mutate and figures out if there's already a pipe_output decorator on it. If so, it appends the mutating function (the one decorated with mutate) as the next step in the already existing pipe_output sequence of steps. Otherwise, it will create a pipe_output decorator on the target function with the mutating function as the step.

I guess with the new wrapper for the decorators, this check if pipe_output exists doesn't work, because it looked at the wrong level?

Re: refactoring -- the current implementation is also not ideal when you use jupyter notebooks, because it could potentially stack the same mutating function twice if you don't restart the kernel. We were looking with @elijahbenizzy at how to change this at the notebook level, but it is a weird edge case rabbit hole that got labeled as TODO once somebody actually runs into the problem.

With 3.0 they've changed a few things. This pins that until someone complains /
wants to fix / update the code for 3.0+.
@skrawcz
Copy link
Collaborator Author

skrawcz commented Feb 27, 2025

Okay there is one bug with parallel & collect on ray using the tracker -- the parallelized nodes aren't being reflected in the Hamilton UI -- I suspect it could be something could be messing with the "source code" pointers the SDK captures or something.
NVM this bug exists on main - running python 3.11, running ray 2.42.0. -- It works without ray just fine on main though.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works then great, but wow there's a lot of complexity here. I think this is a decent approach, but curious if there's a way to simplify accounting for the wrappers?

self.validate(fn)
# # stop unwrapping if not a hamilton function
# # should only be one level of "hamilton wrapping" - and that's what we attach things to.
self.validate(unwrap(fn, stop=lambda f: not hasattr(f, "__hamilton__")))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only do one level deep? Aren't there some weird cases with multiple imports, imports of imports, etc...?

Copy link
Collaborator Author

@skrawcz skrawcz Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we attached to the function itself. What I'm trying to do here is replicate that by using one wrapper layer around a function.

E.g.

@tag(...)
@extract_columns(...)
@parameterize(...)
def some_func(...) -> ... :

Should only have one "wrapper" around some_func - which is what is being modified by the decorators.

This then enables someone to do this:

def a(..) -> ...:
  # base func

q = @parameterize(x=...)(a) 
a_validated = @check_output(range=...)(a)
a_validated.__name__ = "a_validated"  # need to rename to enable this to be stand alone...

This above would enable a, x, and a_validated to be in the DAG. Before it wasn't possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got that. Why just one layer was the question?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just one layer

  1. Doesn't seem like we need more?
  2. This mirrors the prior behavior of directly attaching to the function - there were no layers before.

@@ -805,6 +829,9 @@ def resolve_nodes(fn: Callable, config: Dict[str, Any]) -> Collection[node.Node]
which configuration they need.
:return: A list of nodes into which this function transforms.
"""
# check for mutate...
fn = handle_mutate_hack(fn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take away the name "hack"

:param fn: Function to check
:return: Function or wrapped function to use if applicable.
"""
if hasattr(fn, "__hamilton_wrappers__"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth the complexity it adds? Is there a simpler way to approach it? Need to grok better.

Copy link
Collaborator Author

@skrawcz skrawcz Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that with mutate the current design was to attach to the function and it would be resolved properly via that way.

I think the better approach would be to register mutates via a registry that is applied to functions that are found ... and that way we wouldn't need to attach it to the function directly...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, if we use a registry for mutate, would it be sensible to just have one global registry for all decorators instead of the additional 1-layer wrapper?

@@ -231,6 +231,65 @@ def replacement_function(
# now the error will be clear enough
return node_.callable(*args, **new_kwargs)

async def async_replacement_function(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note -- merge with the above when you get to polishing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants