hookimpl on_dag_run_failed goes to infinite loop #39146
Labels
affected_version:2.6
Issues Reported for 2.6
area:core
area:Listeners
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.6.3
What happened?
We have our own small Airflow plugin to notify a slack channel when a DAG run fails. We use the hookimpl Listeners.
We had twice a strange issue when the plugin has gone to infinite loop and spammed the channel with lots of messages. We have to remove the plugin and restart Airflow instances to stop it. We didn't see any anomalies in logs except that scheduler detected a zombie job. So we think that something may be wrong with the way how we wrote the plugin, or with Airflow.
What you think should happen instead?
No response
How to reproduce
Unfortunately we don't know how to reproduce it, but this issue happened two times.
Operating System
Ubuntu 20.04.6 LTS
Versions of Apache Airflow Providers
No response
Deployment
Google Cloud Composer
Deployment details
Composer version:
composer-2.5.1-airflow-2.6.3
Number of schedulers: 2
Anything else?
This is the log row that appeared near the time when the spam started, for this exact dag id.
Detected zombie job: {'full_filepath': '/home/airflow/gcs/dags/myfolder/dag_player_value_model.py', 'processor_subdir': '/home/airflow/gcs/dags', 'msg': "{'DAG Id': 'player_value_model', 'Task Id': 'main', 'Run Id': 'scheduled__2024-04-19T08:00:00+00:00', 'Hostname': 'airflow-worker-48ssg', 'External Executor Id': '6c81b467-3ba0-430f-9fe1-fe48f3aca8a1'}", 'simple_task_instance': <airflow.models.taskinstance.SimpleTaskInstance object at 0x7f4050d6e760>, 'is_failure_callback': True}
This is our plugin code
PLUGIN CODE
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: