Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please do not abandon this project! #126

Open
oobabooga opened this issue Nov 27, 2023 · 3 comments
Open

Please do not abandon this project! #126

oobabooga opened this issue Nov 27, 2023 · 3 comments

Comments

@oobabooga
Copy link

Earlier this year I was impressed with the offloading performance of FlexGen, and I wonder how it would compare with the performance currently provided by llama.cpp for Llama and Llama-2 models in a CPU offloading scenario.

Any chance Llama support could be added to FlexGen @Ying1123 @keroro824?

@BinhangYuan
Copy link
Collaborator

We are pushing a refactoring of the current implementation to support most HF models, we will release that soon under a fork of this repo and will keep you informed.

@oobabooga
Copy link
Author

oobabooga commented Nov 27, 2023

That's exciting news @BinhangYuan! I look forward to testing the new release and incorporating it in my text-generation-webui project. Cheers :)

@arnfaldur
Copy link

Are there any news of this fork?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants