Do not use the same runner when the script failed #1514
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a helper to create spare runner
Our runners are job agnostic, meaning they can run any job with the matched label. This enables us to create a spare runner with the same label if the initial one doesn't function properly.
This helper is also useful for on-call engineers.
Do not use the same runner when the script failed
Currently, when the script failed, we assume it failed because of an initialization error, and we try to register the same runner again. This is not always true. The script might be "failed" while running the workflow. We should create a new spare runner and destroy the failed one.
I can implement additional checks such as verify if the runner has completed the job, and avoid creating a spare runner if the job is completed. However, runner script failures are uncommon. Even when some errors occur, they exit with a zero exit code, not a non-zero one generally. Therefore, I believe it's currently unnecessary to add more checks. If script failures increase, we can reconsider adding them.