Integrating Airflow with SLURM #24076

ecodina · 2022-06-01T13:48:06Z

ecodina
Jun 1, 2022

Hello,

I am currently trying to use Airflow in an HPC which uses Slurm to distribute jobs over different nodes. We don't wish to replace Slurm but use Airflow to launch jobs.

I have tried to integrate both systems during the last two weeks, but haven't found a proper way to do it. I haven't found information on any site either.

Approach 1: create a custom Executor

In this case, the custom executor generates the Slurm command: sbatch [options] airflow tasks run dag_id task_id run_id.
The executor then regularly checks the squeue command to find when the job has finished.

I found some problems:

The command airflow tasks run dag_id task_id run_id always returns a successful exit code. Therefore, Slurm considers that all jobs have finished properly. For the sake of data correctness, we would prefer that both systems showed the real state of the job. Is there any way for airflow tasks run to propagate any task error? The -N flag didn't help.
Surprisingly, when the task failed (the BashOperator raised an AirflowException), it automatically flagged the task as Failed and stayed this way even if I flagged the task as OK from the executor (since Slurm reported that the job had finished successfully).

Approach 2: create a custom Operator (based on the BashOperator)

a) Using srun
In this case, the custom operator generates the Slurm command: srun [options] /path/to/script/to/run.sh.
Since srun is interactive, all the job output can be easily captured into the Airflow task log.

However:

If the Airflow scheduler fails for any reason, the whole job gets canceled. Since we are running some time-consuming (10+ hours) computations, this can seriously disrupt the business.

b) Using sbatch
In this case the custom operator generates the Slurm command: sbatch [options] /path/to/script/to/run.sh.
The operator then continuously monitors the Slurm queue and raises an AirflowException if it fails, so the task shows as Failed in the UI.

However:

If Airflow fails for any reason, the job continues running (since Slurm is responsible for the execution). However, if we restart the Airflow scheduler, it loses track of that job (it shows as failed), which means that we would need to manually check whether the jobs have finished correctly or not.

Does anyone have experience with this issue? I would appreciate your input!

Thanks :)

Answered by ecodina

Jun 6, 2022

Hi @potiuk

Thank you for your answer!

We've ended up developing a deferrable operator and a trigger. It first submits the job to Slurm and then it defers itself until the trigger detects a state change / new output from the slurm job's log file. Depending on the slurm's state, we defer the operator again, finish OK or raise an AirflowException.

Since triggers are able to run in a highly-available fashion, we will be able to to restart Airflow for any reason without losing track of the already submitted Slurm jobs.

As for idempotency, in our case rerunning the task overrides the data.

View full answer

potiuk · 2022-06-06T11:03:31Z

potiuk
Jun 6, 2022
Collaborator

No idea about SLURM, but:

Approach 1:

You need to implement your executor that it will also monitor and report the task status back. Since Airlfow is a distributed system, simple "task error code" is not enough to keep the sate. The task might fail for various reasons, it can then be retried when - for example - celery worker is restarted and there are various edge cases. There are two ways state might be updated - tasks can modify their own state when they succeed in the database or the monitoring (executor) determines that the tasks are failed or did not have time to update the state and updates it for the task. This is due to distributed nature of Aiflow, potential failover scenarios and the like - so sime "exit code" is not good indicator and you should not base task state on that. Writing your executor means implementing a lot of failover and failure edge cases and it's quite a complex task.

Approach 2:

Airflow operator MUST be idempotent. This is an absolute prerequisite - and when yoy develop your operator you have to make sure to implement idempotency (idempotency in the sense that either reruning the task overrides the data generated by the previous run or that it checks that the run is either running or complete and skips running. This is the job of the operator.

0 replies

ecodina · 2022-06-06T13:54:17Z

ecodina
Jun 6, 2022
Author

Hi @potiuk

Thank you for your answer!

We've ended up developing a deferrable operator and a trigger. It first submits the job to Slurm and then it defers itself until the trigger detects a state change / new output from the slurm job's log file. Depending on the slurm's state, we defer the operator again, finish OK or raise an AirflowException.

Since triggers are able to run in a highly-available fashion, we will be able to to restart Airflow for any reason without losing track of the already submitted Slurm jobs.

As for idempotency, in our case rerunning the task overrides the data.

20 replies

ecodina May 26, 2023
Author

Sure, @elehcim . Just use "self.log.info" from inside the trigger and only yield a TriggerEvent when the slurm state is completed (failed, success, ...).

Since triggers are HA and may be moved around, you'll need to find a way to store how many log lines you've already read, in case the trigger is reinstanciated elsewhere. In my case, I use the "cleanup" method to store the # of lines read for a task in an external database. I then remove this entry on the operator method that gets called after the trigger.

Maybe in the future it will be possible to reserialize the trigger on-demand so this database won't be necessary, but it is not possible now AFAIK.

bejota Apr 22, 2024

Thank you. This is a great solution. Has anyone considered SLURM's REST API rather than SSH?

ecodina Apr 22, 2024
Author

@bejota Since my last answer we have migrated the service to use a new architecture. Airflow now adds the task information (at the SlurmOperator) at a Redis database. When a daemon receives a new Redis object, it processes it and runs the job in the slurm cluster, as well as updates the state at the Redis database. The SlurmTrigger reads the job state in the Redis database and updates the Airflow task accordingly.

I have applied for a presentation in this year's Airflow Summit. If I'm accepted (hopefully!) I'll be able to present this solution in greater detail.

bejota Apr 22, 2024

Very interesting, @ecodina. I would love to learn more. Good luck with the presentation!

bhulsey May 10, 2024

After rebuilding SLURM with support for its REST API, I'm able to submit jobs using the SimpleHttpOperator and monitor them with the deferrable HttpSensor. I've only run a toy example but it was much easier for me than navigating SSH from an Airflow container.

jingpengw · 2023-06-15T19:38:25Z

jingpengw
Jun 15, 2023

I found a DaskExecutor. Dask can use slurm to launch a cluster. I have never tried this idea though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating Airflow with SLURM #24076

{{title}}

Replies: 3 comments 20 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Integrating Airflow with SLURM #24076

ecodina Jun 1, 2022

Replies: 3 comments · 20 replies

potiuk Jun 6, 2022 Collaborator

ecodina Jun 6, 2022 Author

ecodina May 26, 2023 Author

bejota Apr 22, 2024

ecodina Apr 22, 2024 Author

bejota Apr 22, 2024

bhulsey May 10, 2024

jingpengw Jun 15, 2023

ecodina
Jun 1, 2022

Replies: 3 comments 20 replies

potiuk
Jun 6, 2022
Collaborator

ecodina
Jun 6, 2022
Author

ecodina May 26, 2023
Author

ecodina Apr 22, 2024
Author

jingpengw
Jun 15, 2023