Do not use the same runner when the script failed #1514

Our runners are job agnostic, meaning they can run any job with the matched label. This enables us to create a spare runner with the same label if the initial one doesn't function properly. This helper is also useful for on-call engineers.

Currently, when the script failed, we assume it failed because of an initialization error, and we try to register the same runner again. This is not always true. The script might be "failed" while running the workflow. We should create a new spare runner and destroy the failed one. I can implement additional checks such as verify if the runner has completed the job, and avoid creating an spare runner if the job is completed. However, runner script failures are uncommon. Even when they occur, they exit with a zero exit code, not a non-zero one. Therefore, I believe it's currently unnecessary to add more checks. If script failures increase, we can reconsider adding them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not use the same runner when the script failed #1514

Do not use the same runner when the script failed #1514

Commits on Apr 30, 2024