Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI problems #1977

Open
ediril opened this issue Jul 17, 2019 · 20 comments
Open

UI problems #1977

ediril opened this issue Jul 17, 2019 · 20 comments

Comments

@ediril
Copy link

ediril commented Jul 17, 2019

While running shaded jar version 0.22 (I downloaded it from here), I ran into the following problems with the UI:

  1. On the Requests page, the DeployId links don't work because the links don't have the request-id in them. They are like this: http://localhost:7099/singularity/request/undefined/deploy/1. Notice it says undefined instead of the actual request id

  2. After I navigate to the Deployment page for a specific Request id using the correct link from above, I can see the history of tasks that ran for that deployment. Each task in the history has a link to its logs. However Singularity can't seem to find/read the logs on the slave. Clicking on any of them says:

stdout does not exist in this directory.
It may have been moved to stdout
Back to Task Detail Page

I can actually view the logs if I look at them from the Mesos UI by clicking on the specific Sandbox link. In fact I can see both stdout and stderr. For some reason, Singularity framework is unable to get to the logs.

It would be great if these two UI issues could be fixed.
Thank you

@ssalinas
Copy link
Member

First one will be addressed by #1978

The second one I'm not able to replicate. When you click the logs link:

  1. Is it this one you are talking about?

Screen Shot 2019-07-17 at 4 43 21 PM

  1. What url does it put you on? I'm curious if it's a UI issue or possibly something with Singularity's access/config. Singularity tries to fetch the files from the mesos-slave api. So, it will hit the mesos slave on {mesos slave hostname}:5051. If it can't access that (due to security groups or something) then that could also be the cause

@ediril
Copy link
Author

ediril commented Jul 17, 2019

Regarding the second one, yes that's the link I'm talking about. It's unable to tail the slave logs for any task. Clicking that link takes me to http://localhost:7099/singularity/task/sleep-ondemand-C-1563394569546-1-mesos_slave-DEFAULT/tail/stdout where I see the message I posted above

When I look at Singularity logs, I can see that it tried to access the slave but it failed:

ERROR [2019-07-17 21:11:07,600] com.hubspot.singularity.mesos.SingularityMesosExecutorInfoSupport: While fetching directory and container id for task: sleep-ondemand-C-1563394569546-1-mesos_slave-DEFAULT
! java.io.IOException: Remotely Closed
! Causing: com.hubspot.horizon.HttpRuntimeException: java.io.IOException: Remotely Closed
! at com.hubspot.horizon.ning.NingHttpClient.execute(NingHttpClient.java:43)
! at com.hubspot.mesos.client.SingularityMesosClient.getFromMesos(SingularityMesosClient.java:68)
! ... 11 common frames omitted
! Causing: com.hubspot.mesos.client.MesosClient$MesosClientException: Exception fetching http://mesos-slave:5051/slave(1)/state after 00:14.445

The ip:port for the slave is correct, and I'm running the hubspot/singularityexecutorslave:0.21.0 docker image as my slave. The slave is able to execute Run-Once and On-Demand tasks. I just can't tail the logs for some reason from the web UI.

Any ideas?

@ssalinas
Copy link
Member

Hmm, best guess is that the ip:port isn't accessible from singularity. Are you able to curl that same endpoint from the container that singularity is running in?

@ediril
Copy link
Author

ediril commented Jul 17, 2019

So you are saying when I navigate to http://mesos-slave:5051/slave(1)/state in the browser or via curl, I should get something back? This doesn't work for me, I get: mesos-slave didn’t send any data

What does it do for you? What data do you get back?

(For reference, my mesos-master is the mesosphere/mesos-master:1.7.1 docker image)

@ediril
Copy link
Author

ediril commented Jul 17, 2019

As another data point, if I switch to using mesosphere/mesos-slave:1.7.1 image, then http://mesos-slave:5051/slave(1)/state endpoint does return json data (and the exception goes away). However, the Logs link still doesn't work.

Does this functionality require using your hubspot/singularityexecutorslave:0.21.0 slave image?

@ssalinas
Copy link
Member

Ah, it's missing #1949 , which we added as part of the upgrade to 1.8. We're overdue to release some new stuff anyways. I'll try and get a release put together today or tomorrow morning and get 0.23.0 out there

@ediril
Copy link
Author

ediril commented Jul 18, 2019

Awesome thank you! Could you also please make sure #1978 gets included in there as well?

@ssalinas
Copy link
Member

https://github.com/HubSpot/Singularity/releases/tag/Singularity-0.23.0

just released in sonatype, jars should show up in a little while on maven central

@ediril
Copy link
Author

ediril commented Jul 19, 2019

I downloaded 0.23 (shaded) from here. #1978 should be in this release correct? The link still has undefined instead of request-id. Just wanted to let you know.

My environment at work is a bit locked down so I've been relying on these release jars to evaluate Singularity, but I'll try to build it locally.

@ssalinas
Copy link
Member

It should be in there. May need to hard refresh. We have one open issue on the cache headers being too aggressive and not updating nicely between releases

@ediril
Copy link
Author

ediril commented Jul 19, 2019

Ah that was it, thank you very much!

@ediril
Copy link
Author

ediril commented Jul 19, 2019

Unfortunately, the Logs link still doesn't work for me, I get the same message. I'm not sure how to debug this on my end, I'm not getting any errors from SingularityService

I have the following images running on docker:

hubspot/singularityexecutorslave:0.23.0
mesosphere/mesos-master:1.7.1
mesoscloud/zookeeper:3.4.8-ubuntu-14.04

and I run SingularityService on the command line on the host machine (Windows 10) directly. Everything appears to be running properly and I can see the logs if I look at them in the mesos web UI.

@ssalinas
Copy link
Member

Ok. Places to look for stack traces:

  • console logs in your browser
  • SingularityService logs

I'd expect the first in this case. Also, I've never actually tested this all out on a windows machine, only mac/linux. So there could possibly be some weirdness there

@ssalinas ssalinas reopened this Jul 19, 2019
@ediril
Copy link
Author

ediril commented Jul 19, 2019

Oh I think I found the problem:

When I click on the Logs link, the page it takes me does a call to this url: http://localhost:7099/singularity/api/sandbox/sleep-once-6-1563564937641-1-mesos_slave-DEFAULT/read?path=stdout&length=0 which returns 404 with this message:

File \var\lib\mesos\slaves\c333d435-438b-4c7d-98e4-ef02f9a842cf-S0\frameworks\Singularity\executors\sleep-once-6-1563564937641-1-mesos_slave-DEFAULT\runs\a2d927d4-4c41-4b89-b995-9eb2a7d315a5\stdout does not exist for task ID sleep-once-6-1563564937641-1-mesos_slave-DEFAULT

That file actually exists on the slave but notice the \. If those are replaced with /, then it works.. So this must be the issue with Java paths defaulting to \ on windows.

Is this something easy to fix? It would be very convenient to get this working so we don't have to dig into the mesos web UI to look at logs.

@ediril
Copy link
Author

ediril commented Jul 19, 2019

Btw, when I look at http://127.0.0.1:5051/slave(1)/state, I see this:

"name":"Command Executor (Task: sleep-once-6-1563564937641-1-mesos_slave-DEFAULT) (Command: [/home/sleep_...])",
"source":"sleep-once-6-1563564937641-1-mesos_slave-DEFAULT",
"container":"a2d927d4-4c41-4b89-b995-9eb2a7d315a5",
"directory":"/var/lib/mesos/slaves/c333d435-438b-4c7d-98e4-ef02f9a842cf-S0/frameworks/Singularity/executors/sleep-once-6-1563564937641-1-mesos_slave-DEFAULT/runs/a2d927d4-4c41-4b89-b995-9eb2a7d315a5",
...

So I imagine it's Java using the \ because it's running on Windows.

@ssalinas
Copy link
Member

Ah, didn't realize that. Unfortunately there aren't a lot of dev running windows here at HubSpot. I can take a quick look and would be happy to review/merge any PRs if you find the issue as well

@ssalinas
Copy link
Member

@ssalinas
Copy link
Member

I'm going to be away next week, but going to cc @baconmania @pschoenfelder @sjeropkipruto who might be able to finish this up as well and release 0.23.1

@ssalinas
Copy link
Member

FYI, this is not forgotten, just trickier than expected. Mesos does in fact have support for windows, meaning that if I do a blanket replace \ -> /, then if this ever runs against windows mesos it would be wrong. Right now the assumption is that Singularity is running on the same os it is managing

Another option for you could be to used some of our published docker images and run Singularity there

@ediril
Copy link
Author

ediril commented Aug 14, 2019

@ssalinas Thank you for the update. Speaking of Singularity docker image, it turns out Docker for Windows currently does not support HOST networking mode either. You should consider adding Windows to the note on Try it out page.

Yes I agree this is a tricky situation. In my setup, Singularity is running on Windows while Mesos slave is running inside a docker container running Linux. Could you simply use / in path names regardless of OS? Not ideal but could be a practical solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants