-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve API for .execute()
all nodes
#836
Comments
Hmm, config nodes shouldn't be in |
Looking at git blame and going off memory, The current def list_available_variables(self, *, tag_filter=None) -> List[Variable]:
all_nodes = self.graph.get_nodes()
if tag_filter:
valid_filter_values = all(
map(
lambda x: isinstance(x, str)
or (isinstance(x, list) and len(x) != 0)
or x is None,
tag_filter.values(),
)
)
if not valid_filter_values:
raise ValueError("All tag query values must be a string or list of strings")
results = []
for n in all_nodes:
if node.matches_query(n.tags, tag_filter):
results.append(Variable.from_node(n))
else:
results = [Variable.from_node(n) for n in all_nodes]
return results should become def list_available_nodes(self, *, tag_filter=None) -> List[HamiltonNode]:
g = graph_types.HamiltonGraph.from_graph(self.graph)
all_nodes = g.nodes
# reimplement node filter
filtered_nodes = ...
return filtered_nodes |
Sure, happy with the new name. What I'm saying is that having config nodes even show up at all is weird, we should filter by default (and figure out why they're included at all...). |
Config becomes nodes in the graph -- and you have always been able to request config/inputs as outputs. So not weird to me. They should have the appropriate metadata to distinguish them (which I think they do). |
IMO that doesn't make sense. And IIRC this was a hack. The way this should work is:
Config is used at a different stage -- compile time. Why would it be a node? It's a property of the graph, not of execution. It should be a field in the |
config is a place to put invariant inputs as well; i.e. "config", the dictionary passed in for it, can be a runtime input. It's always been that way. |
Yes, but that doesn't mean we should represent it as a node, and definitely shouldn't include jt with list_available_vars |
It is and has always been a node in the graph. It's clearly marked that it doesn't come from a function. You can request it as an output. |
Blech I think this was a hack from the days when everything was passed in statically. See 813e098#diff-bfd5ce89c947f4ba81d03b804ba27d0fdc8686974b21bbbf9e37401080a67817R94. There's a clear demarcation between (a) nodes that are explicitly inputs in the graph and (b) nodes that just exist because they happen to have been shoved into config. Don't see the value of (b) existing in the graph when we have (a) specified as function inputs, but we're adding them all because of a commit 4 years ago? Best to add hamilton/hamilton/graph_types.py Line 189 in 3e7626f
|
I agree with you. I was pointing out that it wasn't a "hack" but rather how it always behaved (i.e., need to manually filter out with
I know config can be used as "input values that are constant for the lifetime of a Driver", but I think it makes more confusing. It's much easier to explain Driver/Builder defines the DAG, execute/materialize define execution values I'm always surprised when reading it in examples or documentation, such as the numpy one dr = driver.Driver(
{
"input_file_name": "air-quality-data.csv",
"after_lock_date": "2020-03-24T00",
"before_lock_date": "2020-03-21T00",
},
analysis_flow,
adapter=adapter,
)
output = ["t_value", "p_value", "before_sample", "after_sample"]
# dr.visualize_execution(output, './my_file.dot', {})
result = dr.execute(output) For file paths and dates, I can intuitively make sense of them being "used as input", but for an int or string, I would just have to guess if it triggers a
Nonetheless, this is related to the recent visualization PR I merged. Config nodes were only displayed in With this viz PR, the above numpy example now shows |
Problem
Currently, the primary way to use
Driver.execute()
is to pass a list of node names (str
). It's useful to get all existing nodes for various reasons:But managing a list of strings is likely to be incomplete, is typo-provide, and can't provide IDE support
Current solution
Use
Driver.list_available_variables()
to get node names programmatically, filter them to exclude config nodes, and pass them toDriver.execute()
.Given "getting all dataflow nodes" is likely the most common usage, it's odd to receive a list and systematically have to do list comprehension over it. I always forget the API to filter out config nodes.
Issues:
Potential solution
.available_nodes()
and issue deprecation warning for direct.list_available_variables()
calls.config()
The text was updated successfully, but these errors were encountered: