-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: experimental python packaging and interface #1912
Conversation
would the wheel bundle FA/paged attention/quantization kernels? |
@fxmarty currently this branch does not include/build any 3rd party kernels but best case it would provide a easy to use upgrade path and maybe loud warnings/errors if the optimized kernels are not included. Curious what your thought are; but I think it would be best to avoid the expensive/hardware specific build process in the default installation process (to make the library as easy to start with as possible), and then provide some interface for checking and installing kernels? At the moment all of the kernel build processes are handled via cli commands, and would require actions outside of the library to add kernels, I wonder if theres a nice way to move that logic into the library and users could choose the kernels they want/support and build those individually... |
I am asking because for example for vllm in nvidia & rocm build we have forks with some modifications. So for example somebody installing vllm from vllm repo/pip and using There could be an external |
@fxmarty those are great points. I think a |
closing this PR in favor of a smaller PR that only adds a workflow to precompile kernels. Will revisit after precompiles are complete #1970 |
This draft PR explores the possibility of wrapping the launcher and server rust applications into a python package to make it even easier to get started with TGI. This change has many implications and may not be sustainable/practical to add/maintain in the long term.
In general the goal of this PR is to enable a simple dev experience fully within a Python runtime. An example API may look like:
Please see the tgi package readme for dev instructions/how to test
Foreseeable issues
Opening this draft PR for visibility and feedback, any ideas/concerns/thoughts would be greatly appreciated 🙏