Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized Phi3: Features to add #277

Open
1 of 2 tasks
EricLBuehler opened this issue May 9, 2024 · 3 comments
Open
1 of 2 tasks

Quantized Phi3: Features to add #277

EricLBuehler opened this issue May 9, 2024 · 3 comments
Labels
models Additions to model or architectures

Comments

@EricLBuehler
Copy link
Owner

EricLBuehler commented May 9, 2024

  • Support for LongRope (this is supported with ISQ in non-GGUF models, though)
    • The challenge is that the scalings information is not present in the GGUF file.
  • X-LoRA: Add X-LoRA support for GGUF #293
    • Again, we already have ISQ support, so this is not critical.
@EricLBuehler EricLBuehler added the models Additions to model or architectures label May 9, 2024
@polarathene
Copy link
Contributor

  • The challenge is that the scalings information is not present in the GGUF file.

I'm not too familiar with what you're talking about, but I have seen this as settings in https://github.com/oobabooga/text-generation-webui

image

No clue if that's related. Presumably user config/overrides might be an alternative option for supplying that information (eg: JSON) or if it's usually a field supported in GGUF, those that want that could presumably patch the metadata directly?

@EricLBuehler
Copy link
Owner Author

EricLBuehler commented Jun 7, 2024

I am not familiar with this interface, but from my experience implementing LongRope, I think they are using a bit of a hack to get this to work because GGUF does not contain the necessary data. We actually load the full long and short factors for non-GGUF phi3 so it is flexible - and guaranteed to be correct unlike theirs - for any sequence length. I have looked at the GGUF metadata, and it does not contain these long/short factors.

We could of course implement something like this. It would not be ideal, but it would provide a way for users to put the context length about where they need it.

@polarathene
Copy link
Contributor

I think they are using a bit of a hack to get this to work because GGUF does not contain the necessary data.

I don't know if it's actually used with GGUF models. The interface is simple enough to use, but some parts aren't very clear.


We could of course implement something like this.

I would suggest deferring until actual feature request is raised 😅

You can just document the caveat with Phi3 quantized? Potentially add a debug! tracing/log call as additional context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Additions to model or architectures
Projects
None yet
Development

No branches or pull requests

2 participants