Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-function modification (opposite/equivalent to pipe) #701

Open
elijahbenizzy opened this issue Feb 15, 2024 · 4 comments
Open

Post-function modification (opposite/equivalent to pipe) #701

elijahbenizzy opened this issue Feb 15, 2024 · 4 comments
Labels
decorators enhancement New feature or request

Comments

@elijahbenizzy
Copy link
Collaborator

elijahbenizzy commented Feb 15, 2024

Is your feature request related to a problem? Please describe.
People often want data quality checks to modify the output, but that's really not what they're supposed to do.
@pipe applies a function, but it is run beforehand. So if you just want to change the output of a node, you have to do it in two steps.

Describe the solution you'd like
New decorator that functions like pipe, but runs afterwards:

@mutate(
    step(_keep_range, range=(0,100)),
    step(_dropna)
)   
def data() -> pd.Series:
    return ...

This would form the DAG: data.raw -> data.with_drop_between -> data

Describe alternatives you've considered
Making it more central to data quality or just integating with the node.

Additional context
Related to an OS question.

@elijahbenizzy elijahbenizzy added enhancement New feature or request decorators labels Feb 15, 2024
@zilto
Copy link
Collaborator

zilto commented Feb 16, 2024

I think there's a place for this along with @pipe. Also the data.raw -> data fits the @check_output semantic.

Although, I propose to rename it @pipe_out @pipe_output or @pipe_post. Maybe we'd rename @pipe to @pipe_in (input, or pre) too?

Although these decorators are not much different than writing additional nodes, I think they can greatly facilitate migration by allowing to plug in an existing codebase into new/smaller Hamilton initiatives

@elijahbenizzy
Copy link
Collaborator Author

I think there's a place for this along with @pipe. Also the data.raw -> data fits the @check_output semantic.

Although, I propose to rename it @pipe_out @pipe_output or @pipe_post. Maybe we'd rename @pipe to @pipe_in (input, or pre) too?

Although these decorators are not much different than writing additional nodes, I think they can greatly facilitate migration by allowing to plug in an existing codebase into new/smaller Hamilton initiatives

Feels like @modify_input and @modify_output might be better than pipe?

@zilto
Copy link
Collaborator

zilto commented Feb 16, 2024

I like your @pipe documentation and the fact that is echoes the pandas .pipe() operation. Apparently, Polars also has a pipe operator.

To me, "pipe" better communicates than "modify" the idea that you can stack many sequential transforms

@elijahbenizzy
Copy link
Collaborator Author

From @skrawcz on #749 (duplicate):

Is your feature request related to a problem? Please describe.
@pipe is nice, but can be a little counter-intuitive to read. i.e. the function parameter declares the dependency, is then modified by @pipe and then passed into the function.

Describe the solution you'd like
@post_pipe would be the opposite of @pipe. The function would declare the dependency, the body of the function would run, and then the @post_pipe steps would run after the function.

@post_pipe(# these run after the function is run.
    step(_transform_1, v=1),
    step(_transform_2, v=2),
)
def A_processed(A: pd.DataFrame) -> pd.DataFrame:
    return A # original A -- could do modifications here.

Describe alternatives you've considered
N/A

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decorators enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants