IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

WesselvanGils · 2024-04-11T12:26:39Z

While running this example my program crashes with the following error:

Generating answer...
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 1000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   288.00 MiB
llama_new_context_with_model: KV self size  =  288.00 MiB, K (f16):  144.00 MiB, V (f16):  144.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    18.57 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   217.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     1.50 MiB
llama_new_context_with_model: graph splits (measure): 2
Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at LLama.LLamaContext.ApplyPenalty(Int32 logits_i, IEnumerable`1 lastTokens, Dictionary`2 logitBias, Int32 repeatLastTokensCount, Single repeatPenalty, Single alphaFrequency, Single alphaPresence, Boolean penalizeNL) in ~/LLamaSharp/LLama/LLamaContext.cs:line 361
   at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext() in ~/LLamaSharp/LLama/LLamaStatelessExecutor.cs:line 109
   at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
   at ProgramHelper.AnswerQuestion(IKernelMemory memory, String question) in ~/MLBackend/ProgramHelper.cs:line 110
   at Program.<Main>$(String[] args) in ~/MLBackend/Program.cs:line 32
   at Program.<Main>(String[] args)

I don't believe this was an issue when I was using Mistral but started happening when I switched over to the embedding model specifically the F32 variant.

The text was updated successfully, but these errors were encountered:

martindevans · 2024-04-11T14:05:05Z

Could you try running this:

var model = LLamaWeights.LoadFromFile("your_model_path");
Console.WriteLine(model.NewlineToken);

The code that's crashing is this:

var nl_token = model.NewlineToken;
var nl_logit = logits[(int)nl_token];

So it seems like your model is probably returning something unexpected for the newline token.

WesselvanGils · 2024-04-11T14:25:35Z

I see, it's returning -1, that explains the IndexOutOfRange. Is this an issue with the model itself?

martindevans · 2024-04-11T14:36:46Z

I'm not certain, but it doesn't seem correct for any model to be returning -1 for the newline token. That would mean the model has no concept of newlines, which is pretty bizarre!

If other quantizations of the same model are returning other values and it's just the f32 one that's returning -1 I would say that's certainly an error in f32.

WesselvanGils · 2024-04-11T15:05:54Z

I'm not sure on this yet but, not having a newline token seems to be a commonality for embedding models. For nomic I tested F32, F16 and Q2_K, I then also tried this model and they all return -1 for their newline token.

martindevans · 2024-04-11T19:04:25Z

If multiple models are showing the same thing I guess that must be normal. Very weird!

In that case I think the NewlineToken method should be updated to return LLamaToken? instead of LLamaToken and all callsites fixed to handle that sometimes being null.

WesselvanGils · 2024-04-15T13:34:42Z

It could be this is intended behavoir. The models I've been testing are models for generating embeddings so it makes sense that they don't have a newline token as they are never expected to generate text.
Doing this

var memory = new KernelMemoryBuilder()
    .WithLLamaSharpTextGeneration(llamaGenerationConfig)
    .WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)

Resolves the issue, using the embedding model to generate the embeddings and a regular model to generate the output.
WithLLamaSharpDefaults assumes a regular model which is capable of both.

martindevans · 2024-04-15T14:19:57Z

In the #662 PR I've modified how tokens are returned from the LLamaSharp API so it returns nullable tokens and fixed all of the call sites to handle this. I think your approach there is the right one though.

zsogitbe · 2024-04-18T12:08:06Z

c325ac9#commitcomment-141108660

psampaio · 2024-05-09T17:22:30Z

The same issue happens when using the SemanticKernel integration using the ITextGenerationService, with an embedding model (nomic).

martindevans added the good first issue Good for newcomers label Apr 11, 2024

AsakusaRinne added the bug Something isn't working label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

WesselvanGils commented Apr 11, 2024

martindevans commented Apr 11, 2024

WesselvanGils commented Apr 11, 2024

martindevans commented Apr 11, 2024

WesselvanGils commented Apr 11, 2024 •

edited

martindevans commented Apr 11, 2024 •

edited

WesselvanGils commented Apr 15, 2024

martindevans commented Apr 15, 2024

zsogitbe commented Apr 18, 2024

psampaio commented May 9, 2024

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

Comments

WesselvanGils commented Apr 11, 2024

martindevans commented Apr 11, 2024

WesselvanGils commented Apr 11, 2024

martindevans commented Apr 11, 2024

WesselvanGils commented Apr 11, 2024 • edited

martindevans commented Apr 11, 2024 • edited

WesselvanGils commented Apr 15, 2024

martindevans commented Apr 15, 2024

zsogitbe commented Apr 18, 2024

psampaio commented May 9, 2024

WesselvanGils commented Apr 11, 2024 •

edited

martindevans commented Apr 11, 2024 •

edited