New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to restart job after failing? #123
Comments
I've got the same happening here... |
It happens also with me, it should continue after failing! |
Hmmm I don't believe a job should continue after failing unless I am misunderstanding the context in which you want it to retry. If a job fails there must be something wrong and if there is something wrong then it needs to be fixed before retrying again. You could have a programmatic error and it will keep retrying and never resolve itself. Now if I am using a third party service in my job (sendgrid, twilio, aws etc...) I will implement a custom jobRetry function that is called when the job fails. This retry has exponential back off and will reschedule the job to run again at a certain point in time. After X number of attempts it will not re-schedule itself and I'll consider it failed. |
I agree. If there was a reason for failure, it shouldn't try to keep trying something it knows will fail. My problem is, after I have located the problem and addressed it, trying to fire up the same job again doesn't work. The job instances must be removed from the data store before it will work again. My feeling is, the behavior should be to stop processing the job on an error until the job has been "redefined". |
What if job fails due a temporary reason like database unavailability? |
See my earlier comment... For things like this I've created my own retry method that is triggered .on fail. Perhaps "Job Retry" might be a topic worth discussing as a new feature. |
For me, the recurring jobs fail "sometimes" when restarting the server (& of course that prevents it from running anymore which is the problem we are talking about here), it is random, but it happens a lot, graceful = ()->
agenda.cancel repeatInterval: { $exists: true, $ne: null }, (err, numRemoved)->
agenda.stop ()->
process.exit 0 |
I think we're running into two separate problems here. One is when a job fails (ie returns an error to the The other issue seems to be if the server shuts down while a job is running, it causes the job to fall into some sort of intermediary state and never gets called again. |
Regarding jobs stopping on failure, I would like to know if that's intended or unintended behavior. If it's intended, then I would like to propose adding a setting to opt-out of that behavior. If it's unintended, then I'll fix that problem. See my error handling proposal here #172 |
@nwkeeley I'm working on adding a custom retry function like you are talking about. When you retry the job are you rescheduling the failed job instance or are you creating a new job with the same data? I have been trying to modify and then reschedule the failed job instance with some weird, inconsistent results. Any insight into how you achieved this would be helpful. |
+1 for supporting retry. |
+1 |
Any updates? Is there a workaround to supporting retry of failed jobs? |
Agenda is currently not retrying failed job automatically which makes sense. It is the developper responsibility to decide if the job has to be restarted depending on the error which has been thrown. (You don’t want your server to explode because it is restarting a job indefinitely which failed because of a TypeError in your code...) To restart a job manually, you can hook on the
|
Has anybody come up with a more complete retry solution in the meantime? The example from @loris is already pretty helpful. But to implement a real retry strategy with exponential back off (as mentionend by @nwkeeley) where/how would I persist the I personally would think that implementing retry strategies really must not be a responsibility of the developer. Other background job libraries also implemented this as a core feature: |
Define a job with your intended maximum number of retries and agenda will take care of automatically rerunning the job in case of a failure. `agenda.define('job with retries', { maxRetries: 2 },` The job is retried with an exponentially increasing delay to avoid too high load on your queue. The formula for the backoff is copied from [Sidekiq](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry) and includes a random element. These would be some possible example values for the delay: |retry #|delay in s| |---| --- | | 1 | 27 | | 2 | 66 | | 3 | 118 | | 4 | 346 | | 6 | 727 | | 7 | 1366 | | 8 | 2460 | | 9 | 4379 | | 10 | 6613 | | 11 | 10288 | | 12 | 14977 | | 13 | 20811 | | 14 | 28636 | | 15 | 38554 | | 16 | 50830 | | 17 | 65803 | | 18 | 83625 | Fixes agenda#123
I am also looking for a solution in which i can automate my business logic if something bad happens. @loris's solution seems good. |
|
If i get error on my fail handler:
then after 'agenda.stop(); agenda.start();
job
send email` will not works anymore. Is it expected behavior and how could start my job after failing again?The text was updated successfully, but these errors were encountered: