Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Builder.with_materializers() #877

Closed
zilto opened this issue May 2, 2024 · 0 comments
Closed

Add Builder.with_materializers() #877

zilto opened this issue May 2, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@zilto
Copy link
Collaborator

zilto commented May 2, 2024

Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader and DataSaver).

  • The nodes can be called directly via .execute()
  • Materializers appear in HamiltonGraph and visualizations even if they aren't executed.
  • Validate the DAG, including the materializers before execution.

Solution 1

dr = (
  driver.Builder()
  .with_modules(...)
  .with_materializers(
    to.dlt(
      id="features_duckdb",
      dependencies=["features_df"]m
      destination=duckdb_dest(...),
    )
  )
  .build()
)

Solution 2

An alternative, would be to allow materializers to be imported and added via .with_modules(). For example, production_materializers.py contains

# production_materializers.py
from hamilton.io.materialization import to

to.dlt(
  id="features__duckdb",
  dependencies=["features_df"],
  destination=duckdb_dest(...),
)
from hamilton import driver
import dataflow
import production_materializers

dr = driver.Builder().with_modules(dataflow, production_materializers).build()

For basic to.parquet() usage, it might be more efficient to store simple Python functions using pd.to_parquet() in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though

@zilto zilto added the enhancement New feature or request label May 2, 2024
skrawcz pushed a commit that referenced this issue May 21, 2024
Following #877 

Added the ability to add materializer nodes to the `FunctionGraph` at `Driver` build time. 

Use cases:
- Easier to view materializers as part of `.display_all_functions()`
- Allows to call materializers by name with `.execute()`. Would allow some users to completely ignore `.materialize()` 
- Possible to combine "static" materializers at build time and "dynamic" materializers at execution time
- Catch invalid dataflows earlier than `dr.materialize()`

## Changes
- Builder now accepts `.with_materializers(*materializers)`
- `Builder.build()` adds materializer nodes after creating the `Driver` and returns the updated Driver. The logic is derived from `Driver.materialize()`
- `Builder.copy()` copies the materializers

## How I tested this
- Added tests to check materializer nodes (savers and loaders) are properly added
- test that "static" and "dynamic" materializers can be used together

## Notes
- each time materializers are added `post_graph_construct` hooks are triggered
- Should be include materializers in the `version` hashing of the dataflow? should we treat "static" and "dynamic" differently. IMO, we might want two create two versions: one for "the dataflow transform, ignoring materializers" and one for "all nodes" since they answer different equality / diffing questions 
- there's duplicate logic between `Builder.build()` and `Driver.materialize()`. A better approach could be to have `Driver.add_materializers()` and `Driver.execute_materializers()`. Both `Builder.build()` and `Driver.materialize()` could call `Driver.add_materializers()`  
- Related to #713,  users need to be aware that materializers need a valid `target` or `dependencies` (type and name) to connect to.

Squashed commits of:

* added Builder.with_materializers()

* moved logic to Driver __init__

* added type annotation

* made materializers an internal attribute

* updated docs; pre-commit hook; typing bugfix

---------

Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
@zilto zilto closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant