Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] throw Turbomind error to python #1539

Open
lijing1996 opened this issue May 1, 2024 · 4 comments
Open

[Feature] throw Turbomind error to python #1539

lijing1996 opened this issue May 1, 2024 · 4 comments
Assignees

Comments

@lijing1996
Copy link

Motivation

Cannot catch and continue running when Turbomind throws an error.

Related resources

No response

Additional context

No response

@zhyncs
Copy link
Contributor

zhyncs commented May 5, 2024

Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.

When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.

@lijing1996
Copy link
Author

Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.

When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.

The first case. In such a case, could it catch the error and then re-import and re-load the model? I found it was sometimes OOM in my case with a large batch size. However, with a small batch size, the speed was low.

@zhyncs
Copy link
Contributor

zhyncs commented May 6, 2024

In such a case, could it catch the error and then re-import and re-load the model?

In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.

@lijing1996
Copy link
Author

In such a case, could it catch the error and then re-import and re-load the model?

In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.

It is just a OOM error. I use VLM to caption lots of images, so I need re-start after crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants