Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health check not running for ON_DEMAND task #1964

Open
bmerry opened this issue Jun 21, 2019 · 5 comments
Open

Health check not running for ON_DEMAND task #1964

bmerry opened this issue Jun 21, 2019 · 5 comments

Comments

@bmerry
Copy link

bmerry commented Jun 21, 2019

I've just started experimenting with Singularity, so apologies if I've just misunderstood how it all works.

I've created a deploy for an ON_DEMAND request with the following health check fields:

"deployHealthTimeoutSeconds": 60,
"healthcheckUri": "/health",
"healthcheckPortIndex": 1,
"healthcheckMaxTotalTimeoutSeconds": 60,

After creating a run I can see the task in the UI, where the health check section says

Beginning when Task enters running, wait a max of 45s for app to start responding, then hit /health with a 5 second timeout every 5 second(s) until: HTTP 200 is recieved

followed by a dashed box with the text "No healthchecks". The HTTP access logs for the task don't show any hits on the /health endpoint. When querying /api/tasks/ids/request/REQUEST_NAME the task shows up in notYetHealthy. After 10 minutes it's killed with the message "OVERDUE_NEW_TASK - Task did not become healthy after 10:00.000".

If I click on the "/health" link in the UI it shows a correct health page, which gives me some confidence that I've got the port mapping set up right.

I'm using a local docker-compose setup for testing, with the following images:

  • hubspot/singularityservice:0.22.0
  • mesosphere/mesos-slave:1.5.0
  • mesosphere/mesos-master:1.5.0
  • netflixoss/exhibitor:1.5.2 (for Zookeeper)

I'm using the Docker containerizer with BRIDGE networking and not using the Singularity executor, in case that makes a difference.

@ssalinas
Copy link
Member

For ON_DEMAND tasks we don't actually run health checks, as it doesn't really have any bearing on a oneoff tasks. I realize the UI is likely confusing here and that's something we can fix (the backend currently doesn't stop you from specifying those options even if they aren't being used). Heathchecks are only run for worker/service types, where we would need to know if something is healthy. e.g. ensure replacement instance is healthy before shutting an old one down

@bmerry
Copy link
Author

bmerry commented Jul 17, 2019

Ok, I can see the argument for not running the health check on ON_DEMAND tasks. I'm using Singularity in a slightly odd way, which is why I trying to define a health check, but I've got alternative tools I can use to monitor health.

Perhaps the API should prevent the checks being defined in the first place, to stop people like me from shooting themselves in the foot? Or perhaps they should be fully ignored, so that the task doesn't get killed 10 minutes later due to not having become healthy?

@ssalinas
Copy link
Member

Oh, read over the fact that it got killed after 10 mins. Will have to take a closer look at that

@ssalinas
Copy link
Member

For the moment though I'd recommend what you said about doing health monitoring in a different way. As an aside, what type of use case do you have for an on demand with health checks? Seems to me that anything long running with health checks should be a worker/service instead anyways

@bmerry
Copy link
Author

bmerry commented Jul 17, 2019

It's part of the software for a large radio telescope. Each observation is managed by one of these jobs, which typically last for a few hours to a day. If one fails, it shouldn't be automatically restarted because higher-level systems have to deal with the failure and rescheduling, which is why I didn't use a worker/service.

In theory it could probably persist state and pick up the pieces if it died and was automatically restarted, but it's not been a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants