Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parallel sampling using vllm #409

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

daspartho
Copy link
Collaborator

@daspartho daspartho commented Nov 29, 2023

close #370

adds support for parallel sampling using vllm library when num_return_sequences in generation kwargs is > 1 and the model is supported by vllm (currently all hf models in llm-vm)

TODO: handle dependencies

Copy link
Collaborator

@VictorOdede VictorOdede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set vllm_support to be a property of the BaseOnsiteLLM class then we can set it to true by default unless it's not supported by the model then we can set the property to false

@daspartho
Copy link
Collaborator Author

made suggested changes. vllm_support is set to true by default and needs to be set false explicitly for unsupported models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parallel sampling with vLLM
2 participants