how to upgrade the airflow cluster while keep my airflow jobs running with no downtime #839

zeddit · 2024-03-22T08:45:00Z

Checks

I have checked for existing issues.
This report is about the User-Community Airflow Helm Chart.

Motivation

It's a common case for upgrading the airflow, e.g. installing a new 3rd party provider's package and upgrade the airflow major version.
All these processes above will involve an upgrade to the airflow image running on k8s. which needs to shutdown the related pods and start a new one with the new version.

And this kind of shutdowns will make the jobs running on top of airflow stopped and may be not recovered because it will stop at a middle place in DAG and marked as success.

How to make airflow helm upgrade do not influence the jobs running and keep the result of job executions correct?

Implementation

No response

Are you willing & able to help?

I am able to submit a PR!
I can help test the feature!

The text was updated successfully, but these errors were encountered:

thesuperzapper · 2024-04-24T18:04:15Z

@zeddit Because changing the airflow image will always require a restart of all worker pods, your only options are something like:

design your DAGs so they are able to recover safely if they are interrupted (which is a good idea, because servers crash, and other things can happen, even when you are not updating)
if you want to prevent Kubernetes from restarting a worker that has any running tasks, you can use the workers.celery.gracefullTermination values, but this only affects restarts caused by a StatefulSet rollout. (Also some Kubernetes providers, put a limit on how long gracefullTerminationPeriod can be, e.g. GKE limits it to 10 minutes)

zeddit · 2024-05-14T05:19:35Z

@thesuperzapper really appreciate for your reply, and I learned a lot.

could I understand that as, if I use KubernetesExecutor, only worker pods will be restarted and they may restart at any time point by being terminated. While other pods like database, scheduler, ui will not restart if I updated the airflow image like apache-airflow:python3.8. and this is the regular update.

what if I need to upgrade scheduler and other components, will that break the correctness of jobs if I followed the dag principle stated above? great thanks

zeddit added the kind/enhancement kind - new features or changes label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to upgrade the airflow cluster while keep my airflow jobs running with no downtime #839

how to upgrade the airflow cluster while keep my airflow jobs running with no downtime #839

zeddit commented Mar 22, 2024

thesuperzapper commented Apr 24, 2024

zeddit commented May 14, 2024

how to upgrade the airflow cluster while keep my airflow jobs running with no downtime #839

how to upgrade the airflow cluster while keep my airflow jobs running with no downtime #839

Comments

zeddit commented Mar 22, 2024

Checks

Motivation

Implementation

Are you willing & able to help?

thesuperzapper commented Apr 24, 2024

zeddit commented May 14, 2024