New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem converting Phi3-instruct-128k; "su" rope scaling in Phi-3 #1685
Comments
Did some additional legwork on this "su" scalilng and here's what I came up with...hope it helps, and hope that implementing it still allows someone to use the new flash attention. And as I'm learning, apparently useful when working with large language models to be knowledgeable about a little thing called "math..." Link to su rope scaling as a jumping off point for ya... Here's a summary of how it's implemented overall in the script, unless I'm mistaken... Phi3SuScaledRotaryEmbedding Class
Phi3Attention Class
|
Thank you for your information. We don't have time to implement it now. Will try support su rope scaling in the future. |
Closing due to it successfully being implemented in release 4.3 |
Hello peeps, it's me again. The new converter works great with Phi3 but doesn't work with the 128k version located here:
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
After much chagrin, I had a scintillating conversation with Claude Opus and he/she/it gave me an outline of what do to. However, I'm posting the errors I received as well for your benefit. Hope this helps!
Trying to get it to work with the phi3-instruct-128k model. I ran
converter.py
in the main branch and it gave me this error, in relevant part:ERROR
``` Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "D:\Scripts\benchmark_chat\Scripts\ct2-transformers-converter.exe\__main__.py", line 7, in File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 2200, in main converter.convert_from_args(args) File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 50, in convert_from_args return self.convert( ^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\converter.py", line 89, in convert model_spec = self._load() ^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 141, in _load spec = loader(model, tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 193, in __call__ spec = self.get_model_spec(model) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Scripts\benchmark_chat\Lib\site-packages\ctranslate2\converters\transformers.py", line 1698, in get_model_spec rotary_scaling_factor = rope_scaling["factor"] ~~~~~~~~~~~~^^^^^^^^^^ KeyError: 'factor' ```Chat-gpt said to modify it as set forth in this pull request, and now it's giving me a different error saying that ctranslate2 only supports "linear" role scaling and that it needs to use
su
whatever that is.NEW ERROR
Since I don't even now what "rope" is let alone "linear" or "su," I've done this legwork and am now passing it off to you all as the experts. Hope this helps. Would be good to be able to use this model in general and bench it.
[EDIT]
Here's some additional legwork that I did, hope that it helps!
Here's what Claude Opus said after some minor questioning and feeding of scripts:
Update the _SUPPORTED_ROPE_SCALING dictionary:
transformers.py
file in the CTranslate2 converter._SUPPORTED_ROPE_SCALING
dictionary.attention_spec.RotaryScalingType
enum value.Modify the RotaryScalingType enum:
attention_spec.py
file.RotaryScalingType
enum definition.Update the CTranslate2 library's C++ code:
include/ctranslate2/layers/attention.h
file.RotaryScalingType
enum definition.src/layers/attention.cc
file.dot_product_attention
).The text was updated successfully, but these errors were encountered: