Do not use the same runner when the script failed #1514

enescakir · 2024-04-26T19:59:07Z

Add a helper to create spare runner

Our runners are job agnostic, meaning they can run any job with the matched label. This enables us to create a spare runner with the same label if the initial one doesn't function properly.

This helper is also useful for on-call engineers.

Do not use the same runner when the script failed

Currently, when the script failed, we assume it failed because of an initialization error, and we try to register the same runner again. This is not always true. The script might be "failed" while running the workflow. We should create a new spare runner and destroy the failed one.

I can implement additional checks such as verify if the runner has completed the job, and avoid creating a spare runner if the job is completed. However, runner script failures are uncommon. Even when some errors occur, they exit with a zero exit code, not a non-zero one generally. Therefore, I believe it's currently unnecessary to add more checks. If script failures increase, we can reconsider adding them.

bsatzger

Thanks! Would be great if you could update the wiki on how to properly use provision_spare_runner as on-call.

Our runners are job agnostic, meaning they can run any job with the matched label. This enables us to create a spare runner with the same label if the initial one doesn't function properly. This helper is also useful for on-call engineers.

Currently, when the script failed, we assume it failed because of an initialization error, and we try to register the same runner again. This is not always true. The script might be "failed" while running the workflow. We should create a new spare runner and destroy the failed one. I can implement additional checks such as verify if the runner has completed the job, and avoid creating an spare runner if the job is completed. However, runner script failures are uncommon. Even when they occur, they exit with a zero exit code, not a non-zero one. Therefore, I believe it's currently unnecessary to add more checks. If script failures increase, we can reconsider adding them.

enescakir requested review from fdr, byucesoy, furkansahin, velioglu and bsatzger April 26, 2024 19:59

enescakir self-assigned this Apr 26, 2024

bsatzger approved these changes Apr 27, 2024

View reviewed changes

enescakir force-pushed the runner-script branch from c9523ba to 4d62bb5 Compare April 30, 2024 07:15

Base automatically changed from runner-script to main April 30, 2024 07:18

enescakir added 2 commits April 30, 2024 10:19

Add a helper to create spare runner

aefdea6

Our runners are job agnostic, meaning they can run any job with the matched label. This enables us to create a spare runner with the same label if the initial one doesn't function properly. This helper is also useful for on-call engineers.

enescakir force-pushed the failed-runner branch from 68c5e9e to 2b5d118 Compare April 30, 2024 07:19

enescakir merged commit 18b4836 into main Apr 30, 2024
6 checks passed

enescakir deleted the failed-runner branch April 30, 2024 07:31

github-actions bot locked and limited conversation to collaborators Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not use the same runner when the script failed #1514

Do not use the same runner when the script failed #1514

enescakir commented Apr 26, 2024

bsatzger left a comment

Do not use the same runner when the script failed #1514

Do not use the same runner when the script failed #1514

Conversation

enescakir commented Apr 26, 2024

Add a helper to create spare runner

Do not use the same runner when the script failed

bsatzger left a comment

Choose a reason for hiding this comment