Relax "Setup tasks must be followed with trigger rule ALL_SUCCESS" constraint #37762

LMnet · 2024-02-26T22:59:20Z

LMnet
Feb 26, 2024

Description

I found that for tasks within setup/teardown block, you can use only ALL_SUCCESS trigger rule. It feels very limiting. For example, I have a use case when I submit a bunch of jobs to an AWS EMR cluster. Jobs are independent, but I want to submit them one by one to not overflow the cluster. So currently I can't implement this kind of logic together with setup/teardown.

Use case/motivation

No response

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

2024-02-26T22:59:24Z

boring-cyborg[bot]
bot Feb 26, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

0 replies

potiuk · 2024-02-27T23:47:05Z

potiuk
Feb 27, 2024
Collaborator

I thnk you need to be much more detailed in explaining what you want - I have difficulty to connect first part of your request (ALL_SUCCESS) With the other (one-by-one) - those two have (IMHO) nothing to do with each other. Converted it into discussion but really if you want help with something it should start with properly explaining (examples, images, dags) what you have problem with - longer than one paragraph - it's very difficult to read mind of the person and there is not enough context to figure out what it is about.

0 replies

LMnet · 2024-02-28T00:18:41Z

LMnet
Feb 28, 2024
Author

@potiuk sure, I can give more details about my use case.

So, I want to create a cluster and run a few jobs there. Something like this:

chain(create_cluster, job1, job2, job3, teardown_cluster.as_teardown(setups=[create_cluster]))

With the current behavior, if the first job fails, other jobs will not be started. But I want them to be started, even if the previous job failed. And at the same time, I don't want them to be submitted at once, so I can't do this:

chain(create_cluster, [job1, job2, job3], teardown_cluster.as_teardown(setups=[create_cluster]))

Theoretically, what I really need is the ability to configure each edge of a graph with different rules. In my particular case, each job should be submitted only if create_cluster is successful. But at the same time, job dependencies between each other are more relaxed: the previous job could fail. In a pseudo code I want something like this:

create_cluster.on_success(job1)
create_cluster.on_success(job2)
create_cluster.on_success(job3)

job1.on_complete(job2)
job2.on_complete(job3)

teardown_cluster.as_teardown(setups=[create_cluster])

As far as I know, Airflow doesn't provide this kind of functionality. And I can assume it will be a major feature. So, I'm trying to find some solution that will at least partially emulate this behavior. I ended up with this solution (simplified code):

job1 = operator(rule = TriggerRule.ALL_DONE)
job2 = operator(rule = TriggerRule.ALL_DONE)
job3 = operator(rule = TriggerRule.ALL_DONE)
chain(create_cluster, job1, job2, job3, teardown_cluster.as_teardown(setups=[create_cluster]))

With this code I expected that even if the previous job failed, the next job would still be submitted. Instead of it I saw this Setup tasks must be followed with trigger rule ALL_SUCCESS error. And that's why I raised this issue.

One possible solution here for me could be to not use the setup/teardown feature. But I don't want to follow this path, because it will create a lot of complexity in my DAGs.

BTW, I saw this comment and assumed that this use case is kinda expected, but currently not implemented.

1 reply

potiuk Feb 28, 2024
Collaborator

Ok. I think @dstandish would be best to comment here. This explanation is better than the one-paragraph before.

LMnet · 2024-04-08T22:33:38Z

LMnet
Apr 8, 2024
Author

Any updates here?

As a temporary solution locally I patched DAG class like this:

class FixedDAG(DAG):

    def validate_setup_teardown(self):
        for task in self.tasks:
            if task.is_setup:
                for down_task in task.downstream_list:
                    if not down_task.is_teardown and down_task.trigger_rule not in (
                            TriggerRule.ALL_SUCCESS, TriggerRule.ALL_DONE
                    ):
                        raise ValueError("Setup tasks must be followed with trigger rule ALL_SUCCESS or ALL_DONE.")
            FailStopDagInvalidTriggerRule.check(dag=self, trigger_rule=task.trigger_rule)

And it seems to be working fine.

1 reply

TomFaulkner Apr 29, 2024

@LMnet , this helped to get the next task to run. But tasks following that tasks get skipped for me.

dstandish · 2024-04-30T01:21:42Z

dstandish
Apr 30, 2024
Collaborator

Hi y'all. So the reason that we added that constraint was, IIRC, that if you didn't have that constraint you could get into odd scenarios when clearing tasks. let me try to remember.

so suppose you had s1 >> w1 >> w2 >> w3 >> t1.as_teardown(s1)

then suppose now that w1 is "all failed". so then w1 runs and succeeds and then w2 and w3 run (both all_success) and t1 does not run (since setup not successful)

now suppose that you clear w2 (downstream). this will clear s1 to rerun (since it's a setup for w2). now suppose s1 succeeds this time. then, according to the trigger rules, w1 should not run, since its trigger rule is all_failed. but it hasn't been cleared, so the trigger rule isn't considered since it's not up for a run -- it just stays in success state. and since w1 stays in success state, then w2 will run -- even though ordinarily it would not run if the setup was successful. so this results in inconsistent / contradictory dependency constraint behavior. we avoid this result by requiring that the trigger rule following a setup is successful.

what do you think? i tried looking at your explanation but didn't understand. i think it might be easier if you explain the real world use case more conversationally than with pseudocode. could you try? why do you want something following a setup to run if the setup fails? the idea of setup is that ... you're setting something up for other things to use...

p.s. i agree with you re being able to configure edges -- that approach makes more sense to me too but that's not how airflow is

0 replies

LMnet · 2024-04-30T02:11:25Z

LMnet
Apr 30, 2024
Author

@dstandish I don't think that you fully get my scenario.

why do you want something following a setup to run if the setup fails? the idea of setup is that ... you're setting something up for other things to use...

I've never asked about this scenario. In your terms, I was talking about the next scenario:

s1 >> w1 >> w2 >> w3 >> t1.as_teardown(s1)

And with this pipeline I want to achieve the next: if the w1 failed, I still want to start w2 and w3. As simple as that.

6 replies

LMnet May 9, 2024
Author

Interesting, I hadn't realized this.

In my actual code, I built an abstraction on top of setup and teardown using TaskGroup. It looks like this (I omitted all unessential code):

class ClusterTaskGroup(TaskGroup):
    def __exit__(self, *args):
        for child in self.children.values():
            if not self.is_setup_or_teardown(child):
                self.create_cluster_task >> child >> self.terminate_cluster_task
        super().__exit__(*args)

So, all tasks within this group depend on both setup and teardown tasks (in my case they are create_cluster_task and terminate_cluster_task).

But it looks like the correct way of specifying dependencies is like this:

    def __exit__(self, *args):
        for root in self.roots:
            if not self.is_setup_or_teardown(root):
                self.create_cluster_task >> root

        for leaf in self.leaves:
            if not self.is_setup_or_teardown(leaf):
                leaf >> self.terminate_cluster_task
        super().__exit__(*args)

Only root tasks depend on create_cluster_task, and only leaf tasks depend on terminate_cluster_task.terminate_cluster_task

I will deploy this version and will see how it goes. Thank you for the clarification!

dstandish May 9, 2024
Collaborator

let us know how it goes

LMnet May 9, 2024
Author

I tried this solution and immediately found a problem.

Let's expand example_setup_teardown, described above, to have 3 tasks:

with DAG(
        dag_id="example_setup_teardown",
        start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
        catchup=False,
        tags=["example"],
) as dag2:
    root_setup = BashOperator(task_id="root_setup", bash_command="sleep 10").as_setup()

    root_normal1 = BashOperator(task_id="normal1",bash_command="sleep 10")

    root_normal2 = BashOperator(
        task_id="normal2", bash_command="sleep 10", trigger_rule=TriggerRule.ALL_DONE
    )

    root_normal3 = BashOperator(
        task_id="normal3", bash_command="sleep 10", trigger_rule=TriggerRule.ALL_DONE
    )

    root_teardown = BashOperator(
        task_id="root_teardown", bash_command="sleep 10"
    ).as_teardown(setups=root_setup)

    root_setup >> root_normal1 >> root_normal2 >> root_normal3 >> root_teardown

Now let's imagine that root_normal2 task failed and we want to restart it. Tasks root_normal1 and root_normal3 were executed successfully and we don't need to restart them. It means in Airflow UI I need to clear only root_normal2 task, without downstream dependencies. Since root_normal2 task is part of a setup/teardown group, with root_normal2 also root_setup and root_teardown will be restarted.

However, there is a problem here: root_normal2 doesn't have an explicit edge with root_teardown. It means that when root_setup completes, root_normal2 and root_teardown both will start execution simultaneously. Which is definitely not what I expected.

To fix this problem I had to specify explicit edges between all root_normal tasks and root_teardown:

root_setup >> root_normal1 >> root_normal2 >> root_normal3 >> root_teardown
root_normal1 >> root_teardown
root_normal2 >> root_teardown

In my real code, it means that instead of this:

    def __exit__(self, *args):
        for root in self.roots:
            if not self.is_setup_or_teardown(root):
                self.create_cluster_task >> root

        for leaf in self.leaves:
            if not self.is_setup_or_teardown(leaf):
                leaf >> self.terminate_cluster_task
        super().__exit__(*args)

I have to do this:

    def __exit__(self, *args):
        for root in self.roots:
            if not self.is_setup_or_teardown(root):
                self.create_cluster_task >> root

        for child in self.children.values():
            if not self.is_setup_or_teardown(child):
                child >> self.terminate_cluster_task
        super().__exit__(*args)

I'm not sure how to react to this behavior. From one point of view, I don't create an explicit dependency between tasks, so they could run in parallel, and it is kinda expected. But from another point of view, setup and teardown tasks are not regular tasks. Their edges are "implicit" to tasks inside a setup/teardown group. So, as a feature user, I expect that teardown should wait until all tasks in a group are finished.

dstandish May 9, 2024
Collaborator

This is actually a known bug recently reported and Jens has a draft pr on it

dstandish May 9, 2024
Collaborator

re that bug, slack thread https://apache-airflow.slack.com/archives/C06K9Q5G2UA/p1712422355480789
exploratory pr #38788
comment that reopened issue #29332 (comment)

if you're interested in trying to fix it, let us know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax "Setup tasks must be followed with trigger rule ALL_SUCCESS" constraint #37762

{{title}}

Replies: 6 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Relax "Setup tasks must be followed with trigger rule ALL_SUCCESS" constraint #37762

LMnet Feb 26, 2024

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Replies: 6 comments · 8 replies

boring-cyborg[bot] bot Feb 26, 2024

potiuk Feb 27, 2024 Collaborator

LMnet Feb 28, 2024 Author

potiuk Feb 28, 2024 Collaborator

LMnet Apr 8, 2024 Author

TomFaulkner Apr 29, 2024

dstandish Apr 30, 2024 Collaborator

LMnet Apr 30, 2024 Author

LMnet May 9, 2024 Author

dstandish May 9, 2024 Collaborator

LMnet May 9, 2024 Author

dstandish May 9, 2024 Collaborator

dstandish May 9, 2024 Collaborator

LMnet
Feb 26, 2024

Replies: 6 comments 8 replies

boring-cyborg[bot]
bot Feb 26, 2024

potiuk
Feb 27, 2024
Collaborator

LMnet
Feb 28, 2024
Author

potiuk Feb 28, 2024
Collaborator

LMnet
Apr 8, 2024
Author

dstandish
Apr 30, 2024
Collaborator

LMnet
Apr 30, 2024
Author

LMnet May 9, 2024
Author

dstandish May 9, 2024
Collaborator

LMnet May 9, 2024
Author

dstandish May 9, 2024
Collaborator

dstandish May 9, 2024
Collaborator