You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With "Extract, Transform, Load" (ETL) as a frame of reference, dlt does "EL" and Hamilton does "T".
What is Hamilton
In short, Hamilton is a library to define a DAG of data transformations in Python. It is similar in scope to dbt, but it's supports all Python types, not just tables/dataframes/SQL constructs. Users can write transformations with Python primitives, pandas, polars, Spark, Ibis, etc. Many users adopt Hamilton for feature engineering (jaffle shop example. It also allows users to define machine learning and LLM dataflows.
It uses a declarative API, which essentially consists of
define your DAG in a Python module
pass the DAG to the Driver responsible for execution
request nodes from the DAG to be executed (e.g., features, tables, models to train)
Integration ideas
dlt plugin for Hamilton
We already added a dlt plugin in Hamilton allowing users to load dlt.Resource as input and save outputs to dlt.Destination. This is useful for Hamilton users who want to start using dlt and run both as a unified pipeline. Also, some Hamilton DAG nodes might be "incompatible with dlt" (e.g., an XGBoost model).
Hamilton help for dlt
It appears to make sense to have a "Hamilton helper" in dlt, similar to the dbt runner. It would help dlt users to package their Hamilton code and bundle it with their dlt pipeline to be executed. A typical pattern would look like this (full ref):
importdltfromhamiltonimportdriverimportslack# NOTE this is dlt code, not an official Slack libraryimporttransform# module containing dataflow definition# EXTRACT & LOADpipeline=dlt.pipeline(
pipeline_name="slack",
destination='duckdb',
dataset_name="slack_community_backup"
)
source=slack.slack_source(
selected_channels=["general"], replies=True
)
load_info=pipeline.run(source)
# TRANSFORMdr=driver.Builder().with_modules(transform).build()
results=dr.execute(
["insert_threads"], # query the `threads` nodeinputs=dict(pipeline=pipeline) # pass the dlt load info
)
Action
get a sense of what dlt users are looking for, their needs regarding Python transforms and usage patterns
define an API and work towards a Hamilton helper in dlt
maybe we only need to publish docs and guides
The text was updated successfully, but these errors were encountered:
Feature description
With "Extract, Transform, Load" (ETL) as a frame of reference, dlt does "EL" and Hamilton does "T".
What is Hamilton
In short, Hamilton is a library to define a DAG of data transformations in Python. It is similar in scope to dbt, but it's supports all Python types, not just tables/dataframes/SQL constructs. Users can write transformations with Python primitives, pandas, polars, Spark, Ibis, etc. Many users adopt Hamilton for feature engineering (jaffle shop example. It also allows users to define machine learning and LLM dataflows.
It uses a declarative API, which essentially consists of
Driver
responsible for executionIntegration ideas
dlt plugin for Hamilton
We already added a dlt plugin in Hamilton allowing users to load
dlt.Resource
as input and save outputs todlt.Destination
. This is useful for Hamilton users who want to start using dlt and run both as a unified pipeline. Also, some Hamilton DAG nodes might be "incompatible with dlt" (e.g., an XGBoost model).Hamilton help for dlt
It appears to make sense to have a "Hamilton helper" in dlt, similar to the dbt runner. It would help dlt users to package their Hamilton code and bundle it with their dlt pipeline to be executed. A typical pattern would look like this (full ref):
Action
The text was updated successfully, but these errors were encountered: