-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661
Comments
Could you try running this: var model = LLamaWeights.LoadFromFile("your_model_path");
Console.WriteLine(model.NewlineToken); The code that's crashing is this: var nl_token = model.NewlineToken;
var nl_logit = logits[(int)nl_token]; So it seems like your model is probably returning something unexpected for the newline token. |
I see, it's returning -1, that explains the IndexOutOfRange. Is this an issue with the model itself? |
I'm not certain, but it doesn't seem correct for any model to be returning If other quantizations of the same model are returning other values and it's just the f32 one that's returning |
I'm not sure on this yet but, not having a newline token seems to be a commonality for embedding models. For nomic I tested F32, F16 and Q2_K, I then also tried this model and they all return |
If multiple models are showing the same thing I guess that must be normal. Very weird! In that case I think the |
It could be this is intended behavoir. The models I've been testing are models for generating embeddings so it makes sense that they don't have a newline token as they are never expected to generate text. var memory = new KernelMemoryBuilder()
.WithLLamaSharpTextGeneration(llamaGenerationConfig)
.WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig) Resolves the issue, using the embedding model to generate the embeddings and a regular model to generate the output. |
In the #662 PR I've modified how tokens are returned from the LLamaSharp API so it returns nullable tokens and fixed all of the call sites to handle this. I think your approach there is the right one though. |
The same issue happens when using the SemanticKernel integration using the ITextGenerationService, with an embedding model (nomic). |
While running this example my program crashes with the following error:
I don't believe this was an issue when I was using Mistral but started happening when I switched over to the embedding model specifically the F32 variant.
The text was updated successfully, but these errors were encountered: