Job random stuck with "Job is waiting for a runner from XXX to come online" #3501
Closed
4 tasks done
Labels
bug
Something isn't working
gha-runner-scale-set
Related to the gha-runner-scale-set mode
needs triage
Requires review from the maintainers
Checks
Controller Version
0.9.1
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
Job randomly gets stuck with the msg "Job is waiting for a runner from XXX to come online"
cancel the job and rerun will fix it.
Observations:
1: Compared with jobs without issue, for this job gets stuck, there is no job started msgs get received on the listener pod (i.e. it only gets job available, job assigned, and then stuck util cancelled manually..)
2. for the job gets stuck, the EphemeralRunnerSet gets patched to 1 replica and then immediately gets patched to null, see logs below
Describe the expected behavior
job should not stuck
Additional Context
Controller Logs
Runner Pod Logs
The text was updated successfully, but these errors were encountered: