Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] serve run --reload to auto-recover during a bad deployment #45204

Closed
GeneDer opened this issue May 8, 2024 · 0 comments · Fixed by #45483
Closed

[Serve] serve run --reload to auto-recover during a bad deployment #45204

GeneDer opened this issue May 8, 2024 · 0 comments · Fixed by #45483
Assignees
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@GeneDer
Copy link
Contributor

GeneDer commented May 8, 2024

Description

Currently when serve run cli is called with --reload flag, Serve will keep the session alive, watch any file change, and redeploy with the changes. However, if there is a fatal failures with the file, the loop will break and shutdown serve. Ideally we should catch fatal failures and keep serve running until user hit control+c. Let's add a try-except block around the reload function and only break out loop if keyboard interrupt is raised.

Use case

No response

@GeneDer GeneDer added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue labels May 8, 2024
@GeneDer GeneDer self-assigned this May 8, 2024
@GeneDer GeneDer changed the title [Serve] serve run --reload to auto-recover during a fatal failure [Serve] serve run --reload to auto-recover during a bad deployment May 21, 2024
edoakes pushed a commit that referenced this issue May 23, 2024
…li` (#45483)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Put the file watching code inside a try-except block to prevent bad
deployment shutdown serve. Also added a test to ensure this behavior.

## Related issue number

Closes #45204

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Gene Su <e870252314@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant