Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow materializer targets to specify inputs #713

Open
elijahbenizzy opened this issue Feb 24, 2024 · 2 comments
Open

Allow materializer targets to specify inputs #713

elijahbenizzy opened this issue Feb 24, 2024 · 2 comments
Labels

Comments

@elijahbenizzy
Copy link
Collaborator

elijahbenizzy commented Feb 24, 2024

Is your feature request related to a problem? Please describe.
I want to be able to use materializers to create simple ETLs of just moving data from one place to another.

Describe the solution you'd like

A little odd but a good use-case. Specifically, I want something like this:

dr = driver.Builder().build() # empty driver
dr.materialize(
    to.csv(
        id="save_some_input_to_csv",
        dependencies=["some_input"]),
    inputs={
        "some_input" : SOME_DATAFRAME
    }
),

This should form a two-node DAG.

Describe alternatives you've considered
Just writing a function to do this, or doing it outside of Hamilton.

@skrawcz
Copy link
Collaborator

skrawcz commented Feb 24, 2024

Yeah it's because it's empty it doesn't work. It needs a node with that input defined right now to work.
Screen Shot 2024-02-23 at 5 57 28 PM

@elijahbenizzy
Copy link
Collaborator Author

Yeah it's because it's empty it doesn't work. It needs a node with that input defined right now to work.

Screen Shot 2024-02-23 at 5 54 29 PM

Yep that's why I opened up this issue, cause I think it should work! It's a nice way to use materializers and track them. To avoid pains/encourage early failures we can add a validate_inputs param and default it to true.

skrawcz pushed a commit that referenced this issue May 21, 2024
Following #877 

Added the ability to add materializer nodes to the `FunctionGraph` at `Driver` build time. 

Use cases:
- Easier to view materializers as part of `.display_all_functions()`
- Allows to call materializers by name with `.execute()`. Would allow some users to completely ignore `.materialize()` 
- Possible to combine "static" materializers at build time and "dynamic" materializers at execution time
- Catch invalid dataflows earlier than `dr.materialize()`

## Changes
- Builder now accepts `.with_materializers(*materializers)`
- `Builder.build()` adds materializer nodes after creating the `Driver` and returns the updated Driver. The logic is derived from `Driver.materialize()`
- `Builder.copy()` copies the materializers

## How I tested this
- Added tests to check materializer nodes (savers and loaders) are properly added
- test that "static" and "dynamic" materializers can be used together

## Notes
- each time materializers are added `post_graph_construct` hooks are triggered
- Should be include materializers in the `version` hashing of the dataflow? should we treat "static" and "dynamic" differently. IMO, we might want two create two versions: one for "the dataflow transform, ignoring materializers" and one for "all nodes" since they answer different equality / diffing questions 
- there's duplicate logic between `Builder.build()` and `Driver.materialize()`. A better approach could be to have `Driver.add_materializers()` and `Driver.execute_materializers()`. Both `Builder.build()` and `Driver.materialize()` could call `Driver.add_materializers()`  
- Related to #713,  users need to be aware that materializers need a valid `target` or `dependencies` (type and name) to connect to.

Squashed commits of:

* added Builder.with_materializers()

* moved logic to Driver __init__

* added type annotation

* made materializers an internal attribute

* updated docs; pre-commit hook; typing bugfix

---------

Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants