How to restart job after failing? #123

skotchio · 2014-11-30T12:50:51Z

If i get error on my fail handler:

agenda.on('fail:send email', function(err, job) {
  console.log("Job failed with error: %s", err.message);
});

then after 'agenda.stop(); agenda.start();jobsend email` will not works anymore. Is it expected behavior and how could start my job after failing again?

The text was updated successfully, but these errors were encountered:

felipap · 2014-12-08T04:17:24Z

I've got the same happening here...
Do jobs stop getting fired after they fail once? Not making sense to me. :(

Abdelhady · 2014-12-23T12:08:22Z

It happens also with me, it should continue after failing!

nwkeeley · 2015-01-22T19:57:20Z

Hmmm I don't believe a job should continue after failing unless I am misunderstanding the context in which you want it to retry. If a job fails there must be something wrong and if there is something wrong then it needs to be fixed before retrying again. You could have a programmatic error and it will keep retrying and never resolve itself.

Now if I am using a third party service in my job (sendgrid, twilio, aws etc...) I will implement a custom jobRetry function that is called when the job fails. This retry has exponential back off and will reschedule the job to run again at a certain point in time. After X number of attempts it will not re-schedule itself and I'll consider it failed.

BlakePetersen · 2015-01-30T22:03:51Z

I agree. If there was a reason for failure, it shouldn't try to keep trying something it knows will fail. My problem is, after I have located the problem and addressed it, trying to fire up the same job again doesn't work. The job instances must be removed from the data store before it will work again.

My feeling is, the behavior should be to stop processing the job on an error until the job has been "redefined".

skotchio · 2015-01-31T07:29:48Z

What if job fails due a temporary reason like database unavailability?

nwkeeley · 2015-01-31T19:33:28Z

See my earlier comment... For things like this I've created my own retry method that is triggered .on fail. Perhaps "Job Retry" might be a topic worth discussing as a new feature.

Abdelhady · 2015-02-01T16:23:13Z

For me, the recurring jobs fail "sometimes" when restarting the server (& of course that prevents it from running anymore which is the problem we are talking about here), it is random, but it happens a lot,
So, I made this workaround which is working fine with me:

graceful = ()->
    agenda.cancel repeatInterval: { $exists: true, $ne: null }, (err, numRemoved)->
        agenda.stop ()->
            process.exit 0

& made an issue describing the whole thing.

Albert-IV · 2015-02-02T19:25:11Z

I think we're running into two separate problems here.

One is when a job fails (ie returns an error to the done() callback or calls job.fail()), which then permanently disables the job.

The other issue seems to be if the server shuts down while a job is running, it causes the job to fall into some sort of intermediary state and never gets called again.

owenallenaz · 2015-05-22T00:05:42Z

Regarding jobs stopping on failure, I would like to know if that's intended or unintended behavior. If it's intended, then I would like to propose adding a setting to opt-out of that behavior. If it's unintended, then I'll fix that problem. See my error handling proposal here #172

jakeorr · 2015-09-29T23:45:15Z

@nwkeeley I'm working on adding a custom retry function like you are talking about. When you retry the job are you rescheduling the failed job instance or are you creating a new job with the same data? I have been trying to modify and then reschedule the failed job instance with some weird, inconsistent results. Any insight into how you achieved this would be helpful.

sukrit007 · 2015-10-04T18:56:55Z

+1 for supporting retry.

omakoleg · 2015-10-04T19:38:14Z

+1

ronenteva · 2016-05-20T02:42:47Z

Any updates? Is there a workaround to supporting retry of failed jobs?

loris · 2016-05-27T13:49:06Z

Agenda is currently not retrying failed job automatically which makes sense. It is the developper responsibility to decide if the job has to be restarted depending on the error which has been thrown. (You don’t want your server to explode because it is restarting a job indefinitely which failed because of a TypeError in your code...)

To restart a job manually, you can hook on the fail event and set a new nextRunAt, for instance:

agenda.on('fail', (err, job) => {
  if (isErrorTemporary(err)) { // checking that the error is a network error for instance
    job.attrs.nextRunAt = moment().add(10000, 'milliseconds').toDate(); // retry 10 seconds later
    job.save();
  }
});

jhilden · 2018-11-06T17:27:01Z

Has anybody come up with a more complete retry solution in the meantime?

The example from @loris is already pretty helpful. But to implement a real retry strategy with exponential back off (as mentionend by @nwkeeley) where/how would I persist the retryCount that would be necessary for this to work? Can I store additional custom metadata on a job?

I personally would think that implementing retry strategies really must not be a responsibility of the developer. Other background job libraries also implemented this as a core feature:
https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry
https://github.com/Automattic/kue#failure-backoff

Define a job with your intended maximum number of retries and agenda will take care of automatically rerunning the job in case of a failure. `agenda.define('job with retries', { maxRetries: 2 },` The job is retried with an exponentially increasing delay to avoid too high load on your queue. The formula for the backoff is copied from [Sidekiq](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry) and includes a random element. These would be some possible example values for the delay: |retry #|delay in s| |---| --- | | 1 | 27 | | 2 | 66 | | 3 | 118 | | 4 | 346 | | 6 | 727 | | 7 | 1366 | | 8 | 2460 | | 9 | 4379 | | 10 | 6613 | | 11 | 10288 | | 12 | 14977 | | 13 | 20811 | | 14 | 28636 | | 15 | 38554 | | 16 | 50830 | | 17 | 65803 | | 18 | 83625 | Fixes agenda#123

jhilden · 2019-02-28T14:51:35Z

I took a stab at implementing automatic retries in PR #777

@loris is this something you would consider merging? Then I would put more work into the PR (docs, etc.)

Touseef-haider · 2021-11-23T07:55:08Z

I am also looking for a solution in which i can automate my business logic if something bad happens. @loris's solution seems good.

LindoAlien · 2022-04-24T20:50:02Z

//i found the solution for me works fine. 
function createjob(req) {
    //create the failCount_ for management the retryes
    req.failCount_ = req.failCount_ + 1
    if (req.failCount_ <= 3) {
        (async () => {
            await agenda.schedule(`${(req.failCount_) * 4} seconds`, "pgto", req);
            console.log("Job successfully saved");
        })();
    } else {
        console.log("rretry 4x - cancell the retry")
    }
}
agenda.define("pgto", async (job, done) => {
    const { wallet, valorpg, texto } = job.attrs.data;
    await PgtoPremio(wallet, texto).then(result => {
        if (result) {
            job.remove(function (err) {
                if (!err) {
                    console.log("Successfully removed job from collection")
                } else {
                    //  console.log(err); //prints null
                };
            });
        }
    });
    done();
})
//listen the FAIL and CreateJob()
agenda.on('fail', function (err, job) {
    console.log("Job finished failed");
    createjob(job.attrs.data);
});
 createjob(req.body)

loris closed this as completed May 27, 2016

jhilden mentioned this issue Feb 28, 2019

adds (optional) automatic retries with exponential backoff to jobs #777

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to restart job after failing? #123

How to restart job after failing? #123

skotchio commented Nov 30, 2014

felipap commented Dec 8, 2014

Abdelhady commented Dec 23, 2014

nwkeeley commented Jan 22, 2015

BlakePetersen commented Jan 30, 2015

skotchio commented Jan 31, 2015

nwkeeley commented Jan 31, 2015

Abdelhady commented Feb 1, 2015

Albert-IV commented Feb 2, 2015

owenallenaz commented May 22, 2015

jakeorr commented Sep 29, 2015

sukrit007 commented Oct 4, 2015

omakoleg commented Oct 4, 2015

ronenteva commented May 20, 2016

loris commented May 27, 2016

jhilden commented Nov 6, 2018

jhilden commented Feb 28, 2019

Touseef-haider commented Nov 23, 2021

LindoAlien commented Apr 24, 2022 •

edited

How to restart job after failing? #123

How to restart job after failing? #123

Comments

skotchio commented Nov 30, 2014

felipap commented Dec 8, 2014

Abdelhady commented Dec 23, 2014

nwkeeley commented Jan 22, 2015

BlakePetersen commented Jan 30, 2015

skotchio commented Jan 31, 2015

nwkeeley commented Jan 31, 2015

Abdelhady commented Feb 1, 2015

Albert-IV commented Feb 2, 2015

owenallenaz commented May 22, 2015

jakeorr commented Sep 29, 2015

sukrit007 commented Oct 4, 2015

omakoleg commented Oct 4, 2015

ronenteva commented May 20, 2016

loris commented May 27, 2016

jhilden commented Nov 6, 2018

jhilden commented Feb 28, 2019

Touseef-haider commented Nov 23, 2021

LindoAlien commented Apr 24, 2022 • edited

LindoAlien commented Apr 24, 2022 •

edited