Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

Open
WesselvanGils opened this issue Apr 11, 2024 · 9 comments
Open
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@WesselvanGils
Copy link

While running this example my program crashes with the following error:

Generating answer...
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 1000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   288.00 MiB
llama_new_context_with_model: KV self size  =  288.00 MiB, K (f16):  144.00 MiB, V (f16):  144.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    18.57 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   217.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     1.50 MiB
llama_new_context_with_model: graph splits (measure): 2
Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at LLama.LLamaContext.ApplyPenalty(Int32 logits_i, IEnumerable`1 lastTokens, Dictionary`2 logitBias, Int32 repeatLastTokensCount, Single repeatPenalty, Single alphaFrequency, Single alphaPresence, Boolean penalizeNL) in ~/LLamaSharp/LLama/LLamaContext.cs:line 361
   at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext() in ~/LLamaSharp/LLama/LLamaStatelessExecutor.cs:line 109
   at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
   at ProgramHelper.AnswerQuestion(IKernelMemory memory, String question) in ~/MLBackend/ProgramHelper.cs:line 110
   at Program.<Main>$(String[] args) in ~/MLBackend/Program.cs:line 32
   at Program.<Main>(String[] args)

I don't believe this was an issue when I was using Mistral but started happening when I switched over to the embedding model specifically the F32 variant.

@martindevans
Copy link
Member

Could you try running this:

var model = LLamaWeights.LoadFromFile("your_model_path");
Console.WriteLine(model.NewlineToken);

The code that's crashing is this:

var nl_token = model.NewlineToken;
var nl_logit = logits[(int)nl_token];

So it seems like your model is probably returning something unexpected for the newline token.

@WesselvanGils
Copy link
Author

I see, it's returning -1, that explains the IndexOutOfRange. Is this an issue with the model itself?

@martindevans
Copy link
Member

I'm not certain, but it doesn't seem correct for any model to be returning -1 for the newline token. That would mean the model has no concept of newlines, which is pretty bizarre!

If other quantizations of the same model are returning other values and it's just the f32 one that's returning -1 I would say that's certainly an error in f32.

@WesselvanGils
Copy link
Author

WesselvanGils commented Apr 11, 2024

I'm not sure on this yet but, not having a newline token seems to be a commonality for embedding models. For nomic I tested F32, F16 and Q2_K, I then also tried this model and they all return -1 for their newline token.

@martindevans
Copy link
Member

martindevans commented Apr 11, 2024

If multiple models are showing the same thing I guess that must be normal. Very weird!

In that case I think the NewlineToken method should be updated to return LLamaToken? instead of LLamaToken and all callsites fixed to handle that sometimes being null.

@martindevans martindevans added the good first issue Good for newcomers label Apr 11, 2024
@WesselvanGils
Copy link
Author

It could be this is intended behavoir. The models I've been testing are models for generating embeddings so it makes sense that they don't have a newline token as they are never expected to generate text.
Doing this

var memory = new KernelMemoryBuilder()
    .WithLLamaSharpTextGeneration(llamaGenerationConfig)
    .WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)

Resolves the issue, using the embedding model to generate the embeddings and a regular model to generate the output.
WithLLamaSharpDefaults assumes a regular model which is capable of both.

@martindevans
Copy link
Member

In the #662 PR I've modified how tokens are returned from the LLamaSharp API so it returns nullable tokens and fixed all of the call sites to handle this. I think your approach there is the right one though.

@zsogitbe
Copy link
Contributor

@psampaio
Copy link

psampaio commented May 9, 2024

The same issue happens when using the SemanticKernel integration using the ITextGenerationService, with an embedding model (nomic).

@AsakusaRinne AsakusaRinne added the bug Something isn't working label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants