Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX Hamilton Project #893

Open
zilto opened this issue May 6, 2024 · 0 comments
Open

UX Hamilton Project #893

zilto opened this issue May 6, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@zilto
Copy link
Collaborator

zilto commented May 6, 2024

Current Limitations

When consulting a Python project using Hamilton, there is no way to tell which files are "Hamilton modules".

This has several implications:

  • User doesn't know what can be imported and passed to a Driver
  • User might unknowingly add functions to a module, rendering it invalid for Hamilton
  • Project and IDE tooling for Hamilton don't have a standardized / centralized way to identify Hamilton modules
  • User / tools can't know which combinations of modules can be passed together to a Driver

I touched on a similar topic in Issue #747 in the context of the CLI.

Benefits

I proposed the notion of Project (to map to Hamilton UI "project"; maybe "workspace" is better) to allow users to specify "Hamilton modules".

Features it could unlock:

LSP: multi-module features

  • code navigation. You're currently editing hello.py, but the LSP builds the dataflow with both hello.py and world.py and knows about their nodes.
  • visualization. Allow to view multiple modules in the VSCode extension instead of only current file

CLI / pre-commit / CI: apply to all

  • validate all modules. The pre-commit can attempt to build all "single" and "composed" dataflows
  • generate all visualizations. Use the CLI to generate visualizations of all modules on command or commit

Hamilton UI

  • sync catalog without execution. The UI could better separate "historical dataflows" that were executed from "available dataflows" representing the state of the current code

API design

Hamilton is designed around 2 layers: dataflow definition and dataflow execution. This API relates to dataflow definition, which requires knowing:

  • required: Python modules (file paths; one or more)
  • optional: Driver config (dict)

Given Hamilton is Python-centric, it should adopt pyproject.toml as a standard. The TOML format is also well-supported by other languages for parsing (e.g., TypeScript in VSCode extension, future Rust dev tools). The format supports the relevant types to specify the Python modules and config.

Example TOML; it provides flexibility for specifying dataflow definition

# shortform notation
[tool.hamilton]
dataflows = [
  { name = "greetings", modules = ["world.py"] },
  { modules = ["hello.py"] },  # `name` is inferred when `len(modules) == 1`
]

# longform notation
# mutually exclusive with shortform because they both use `tool.hamilton.dataflows`

[[tool.hamilton.dataflows]]  # this adds to the list `hamilton.dataflows`
modules = ["single.py"]  # `name` is inferred when `len(modules) == 1`

[[tool.hamilton.dataflows]]
name = "composed"
modules = ["a.py", "b.py"]  # list `hamilton.dataflows[i].modules[...]`

[[tool.hamilton.dataflows]]
name = "inline_config"
modules = ["a.py"]
config = { env = "dev", owner = "me" }  # mapping `hamilton.dataflows[i].config{...}`

[[tool.hamilton.dataflows]]
name = "multiline_config"
modules = ["a.py"]
config.env = "dev"  # key-value pair `hamilton.dataflows[i].config{env: "dev"}`
config.owner = "me"
config.key1 = true
config.key2 = false
config.key3 = 12345

API extensibility

Currently, we only define tool.hamilton.dataflows, but we can add more configurations.

@zilto zilto added the enhancement New feature or request label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant