Aborted Job: Server 'worker1' shut down unexpectedly #747

1992kk · 2024-05-02T10:24:13Z

Summary

Jobs failing with "Aborted Job: Server 'worker1' shut down unexpectedly".

Steps to reproduce the problem

Run a heavy network IO job(transferring large file or spawning 10000 SSH connections to remote hosts) and parallelly schedule a cron.

Your Setup

Operating system and version?

Rocky Linux release 8.9

Node.js version?

v16.20.2

Cronicle software version?

Version 0.9.25

Are you using a multi-server setup, or just a single server?

Single Primary with multiple-workers

Are you using the filesystem as back-end storage, or S3/Couchbase?

Local FS

Can you reproduce the crash consistently?

No

Log Excerpts

Sharing some failure events.

PID: 399131
Elapsed Time: 3 minutes, 28 seconds
Performance Metrics: (No metrics provided)
Avg. Memory Usage: 71.8 MB (Peak: 71.8 MB)
Avg. CPU Usage: 2.8% (Peak: 2.8%)
Error Code: 1

Error Description:
Aborted Job: Server 'worker1' shut down unexpectedly.

PID: 1561760
Elapsed Time: 4 hours, 4 minutes
Performance Metrics: (No metrics provided)
Avg. Memory Usage: 81.7 MB (Peak: 81.9 MB)
Avg. CPU Usage: 0.03% (Peak: 3.6%)
Error Code: 1

Error Description:
Aborted Job: Server 'worker1' shut down unexpectedly.

The text was updated successfully, but these errors were encountered:

jhuckaby · 2024-05-02T16:23:59Z

This is a sign that your worker server is overloaded, and it cannot maintain the websocket connection to the Cronicle master server.

It's possible that the Cronicle process was killed by the kernel (OOM) to free up memory. I don't know anything about "Rocky" linux, but I would look for the kernel OOM logs to see if it is killing processes.

I'd also recommend monitoring CPU, memory and network connections while your job is running. I made a free app called Performa which does this, but there are many others that do it too.

1992kk · 2024-05-06T05:10:02Z

Thanks @jhuckaby for the response.

To avoid the cron execution from getting impacted, if my understanding is correct, "retries" option would re-run the cron for all kind of failures but is there any way to re-run a cron automatically for server related issues like this?

A "conditional retry" maybe?

jhuckaby · 2024-05-06T16:34:12Z

There's currently no way to do that, but it's a great feature suggestion. I'll add it to the list. Thank you! 🙏🏻

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborted Job: Server 'worker1' shut down unexpectedly #747

Aborted Job: Server 'worker1' shut down unexpectedly #747

1992kk commented May 2, 2024

jhuckaby commented May 2, 2024

1992kk commented May 6, 2024

jhuckaby commented May 6, 2024

Aborted Job: Server 'worker1' shut down unexpectedly #747

Aborted Job: Server 'worker1' shut down unexpectedly #747

Comments

1992kk commented May 2, 2024

Summary

Steps to reproduce the problem

Your Setup

Operating system and version?

Node.js version?

Cronicle software version?

Are you using a multi-server setup, or just a single server?

Are you using the filesystem as back-end storage, or S3/Couchbase?

Can you reproduce the crash consistently?

Log Excerpts

jhuckaby commented May 2, 2024

1992kk commented May 6, 2024

jhuckaby commented May 6, 2024