Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors. #1464

Open
insistence-essenn opened this issue May 9, 2024 · 2 comments
Milestone

Comments

@insistence-essenn
Copy link

Note that the issue tracker is NOT the place for general support.
Unable to solve this issue as I am using Distributed-compose.yml

@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1 May 9, 2024
@insistence-essenn
Copy link
Author

@frostyplanet @qinxuye @bufferoverflow Any help?

@frostyplanet
Copy link
Contributor

frostyplanet commented May 15, 2024

I've encounter the same problem, do not have any clue for now.
What's your enviroment and related package version? (vllm, pytorch, ray, nvidia-drivers, cuda ).
vllm will print error into stdout (but do not log to xinference.log). Could you look for error on the screen output (or docker logs) before the exception ? Is there anything simular to this :
ray_error_libgomp_20240514-144338

@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants