Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to restart job after failing? #123

Closed
skotchio opened this issue Nov 30, 2014 · 18 comments · May be fixed by #777
Closed

How to restart job after failing? #123

skotchio opened this issue Nov 30, 2014 · 18 comments · May be fixed by #777

Comments

@skotchio
Copy link

If i get error on my fail handler:

agenda.on('fail:send email', function(err, job) {
  console.log("Job failed with error: %s", err.message);
});

then after 'agenda.stop(); agenda.start();jobsend email` will not works anymore. Is it expected behavior and how could start my job after failing again?

@felipap
Copy link

felipap commented Dec 8, 2014

I've got the same happening here...
Do jobs stop getting fired after they fail once? Not making sense to me. :(

@Abdelhady
Copy link

It happens also with me, it should continue after failing!

@nwkeeley
Copy link
Member

Hmmm I don't believe a job should continue after failing unless I am misunderstanding the context in which you want it to retry. If a job fails there must be something wrong and if there is something wrong then it needs to be fixed before retrying again. You could have a programmatic error and it will keep retrying and never resolve itself.

Now if I am using a third party service in my job (sendgrid, twilio, aws etc...) I will implement a custom jobRetry function that is called when the job fails. This retry has exponential back off and will reschedule the job to run again at a certain point in time. After X number of attempts it will not re-schedule itself and I'll consider it failed.

@BlakePetersen
Copy link

I agree. If there was a reason for failure, it shouldn't try to keep trying something it knows will fail. My problem is, after I have located the problem and addressed it, trying to fire up the same job again doesn't work. The job instances must be removed from the data store before it will work again.

My feeling is, the behavior should be to stop processing the job on an error until the job has been "redefined".

@skotchio
Copy link
Author

What if job fails due a temporary reason like database unavailability?

@nwkeeley
Copy link
Member

See my earlier comment... For things like this I've created my own retry method that is triggered .on fail. Perhaps "Job Retry" might be a topic worth discussing as a new feature.

@Abdelhady
Copy link

For me, the recurring jobs fail "sometimes" when restarting the server (& of course that prevents it from running anymore which is the problem we are talking about here), it is random, but it happens a lot,
So, I made this workaround which is working fine with me:

graceful = ()->
    agenda.cancel repeatInterval: { $exists: true, $ne: null }, (err, numRemoved)->
        agenda.stop ()->
            process.exit 0

& made an issue describing the whole thing.

@Albert-IV
Copy link
Collaborator

I think we're running into two separate problems here.

One is when a job fails (ie returns an error to the done() callback or calls job.fail()), which then permanently disables the job.

The other issue seems to be if the server shuts down while a job is running, it causes the job to fall into some sort of intermediary state and never gets called again.

@owenallenaz
Copy link

Regarding jobs stopping on failure, I would like to know if that's intended or unintended behavior. If it's intended, then I would like to propose adding a setting to opt-out of that behavior. If it's unintended, then I'll fix that problem. See my error handling proposal here #172

@jakeorr
Copy link
Contributor

jakeorr commented Sep 29, 2015

@nwkeeley I'm working on adding a custom retry function like you are talking about. When you retry the job are you rescheduling the failed job instance or are you creating a new job with the same data? I have been trying to modify and then reschedule the failed job instance with some weird, inconsistent results. Any insight into how you achieved this would be helpful.

@sukrit007
Copy link

+1 for supporting retry.

@omakoleg
Copy link

omakoleg commented Oct 4, 2015

+1

@ronenteva
Copy link
Contributor

Any updates? Is there a workaround to supporting retry of failed jobs?

@loris
Copy link
Member

loris commented May 27, 2016

Agenda is currently not retrying failed job automatically which makes sense. It is the developper responsibility to decide if the job has to be restarted depending on the error which has been thrown. (You don’t want your server to explode because it is restarting a job indefinitely which failed because of a TypeError in your code...)

To restart a job manually, you can hook on the fail event and set a new nextRunAt, for instance:

agenda.on('fail', (err, job) => {
  if (isErrorTemporary(err)) { // checking that the error is a network error for instance
    job.attrs.nextRunAt = moment().add(10000, 'milliseconds').toDate(); // retry 10 seconds later
    job.save();
  }
});

@loris loris closed this as completed May 27, 2016
@jhilden
Copy link

jhilden commented Nov 6, 2018

Has anybody come up with a more complete retry solution in the meantime?

The example from @loris is already pretty helpful. But to implement a real retry strategy with exponential back off (as mentionend by @nwkeeley) where/how would I persist the retryCount that would be necessary for this to work? Can I store additional custom metadata on a job?

I personally would think that implementing retry strategies really must not be a responsibility of the developer. Other background job libraries also implemented this as a core feature:
https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry
https://github.com/Automattic/kue#failure-backoff

jhilden added a commit to railslove/agenda that referenced this issue Feb 28, 2019
Define a job with your intended maximum number of retries and agenda will take care of automatically rerunning the job in case of a failure.

`agenda.define('job with retries', { maxRetries: 2 },`

The job is retried with an exponentially increasing delay to avoid too high load on your queue.

The formula for the backoff is copied from [Sidekiq](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry) and includes a random element.

These would be some possible example values for the delay:

|retry #|delay in s|
|---| --- |
| 1 | 27 |
| 2 | 66 |
| 3 | 118 |
| 4 | 346 |
| 6 | 727 |
| 7 | 1366 |
| 8 | 2460 |
| 9 | 4379 |
| 10 | 6613 |
| 11 | 10288 |
| 12 | 14977 |
| 13 | 20811 |
| 14 | 28636 |
| 15 | 38554 |
| 16 | 50830 |
| 17 | 65803 |
| 18 | 83625 |

Fixes agenda#123
@jhilden
Copy link

jhilden commented Feb 28, 2019

I took a stab at implementing automatic retries in PR #777

@loris is this something you would consider merging? Then I would put more work into the PR (docs, etc.)

@Touseef-haider
Copy link

I am also looking for a solution in which i can automate my business logic if something bad happens. @loris's solution seems good.

@LindoAlien
Copy link

LindoAlien commented Apr 24, 2022

//i found the solution for me works fine. 
function createjob(req) {
    //create the failCount_ for management the retryes
    req.failCount_ = req.failCount_ + 1
    if (req.failCount_ <= 3) {
        (async () => {
            await agenda.schedule(`${(req.failCount_) * 4} seconds`, "pgto", req);
            console.log("Job successfully saved");
        })();
    } else {
        console.log("rretry 4x - cancell the retry")
    }
}
agenda.define("pgto", async (job, done) => {
    const { wallet, valorpg, texto } = job.attrs.data;
    await PgtoPremio(wallet, texto).then(result => {
        if (result) {
            job.remove(function (err) {
                if (!err) {
                    console.log("Successfully removed job from collection")
                } else {
                    //  console.log(err); //prints null
                };
            });
        }
    });
    done();
})
//listen the FAIL and CreateJob()
agenda.on('fail', function (err, job) {
    console.log("Job finished failed");
    createjob(job.attrs.data);
});
 createjob(req.body)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.