New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: 'history is longer than the max chat context' error #338
Comments
There is another param in the
I suggest first verifying that simple chat works for your local LLM setup. I don't know the details of how you spun up the mistral model, but if you already have it spun up locally and listening at
Before trying RAG, I suggest first verifying simple chat works with these settings (or change them if your local LLM serving setup differs from my assumption):
Once this works, then try the document-chat with this config --
(Note that |
Ok I've setup a working example using This works for me on an M1 Mac. Here's a sample run: |
Thank you very much for your help, i didn't see the ps: for the |
Note that the syntax is “local/localhost:8000/v1” |
Yep, this the thing that generate a 404: WARNING - OpenAI API request failed with error:
Error code: 404 - {'object': 'error', 'message': 'The model local/localhost:8000/v1 does not exist.', 'type': 'invalid_request_error', 'param': None, 'code': None}. |
That's puzzling. Looking at the vllm docs, The err msg |
I think it's because this check exists, and it's not present on Litellm and Ollama (?) https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L131-L138. |
Can you post your exact You might try setting |
When i try this is my working config atm : llm = lr.language_models.OpenAIGPTConfig(
api_base="http://localhost:8000/v1",
chat_model="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
completion_model="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
use_chat_for_completion=True,
max_output_tokens=256,
temperature=0.2,
chat_context_length=4096
) |
Did you mean to say it's working now? |
i mean it download and run the llm, but this not what i except for my usecase, i already have a running llm, so i just want to plug langroid to it. This is why i use the config that i linked. |
I followed the same steps, but the problem is that it works fine with simple examples/chat, but when it comes to using DocChat it doesn't give any response. It just shows the logs, like retrieving objects and all, but in the end, it doesn't give any output/response. @pchalasani Can you please help me with this? I'm using it on my Arch Linux machine and using the zsh terminal. PS: I am using it for creating the RAG application; I have access to the Mistral-7b via oogabooga-text-gen-web-ui and using same in langroid. Also guide me if I'm missing something coz I'm new to this things. |
Are you running exactly the The correct syntax when using ollama is:
or if you use ooba to serve this model at
|
If you ran it correctly, the fact you're not getting a response could mean that for that specific question there was no answer found. If you're not getting a response for any question at all (especially "obvious" ones that it should find), then that needs looking into. |
I made this changes. |
that should work, for a good enough local model. (I assume you don't actually use |
yes, but still it's not giving response nor any errors. Do I have to provide an openAI key? As I am using Mistral-7B which is locally deployed. |
Can you check with other documents and/or more variety of questions and see if you’re still not getting a response? You can also try the bare chat mode (I e not document q/a) as suggested in the comments in the script, to ensure that your local LLM setup is working. And finally, if you can make a reproducible example (or give me a specific doc and specific question) I can see if I can reproduce this issue |
Hi,
i try to build a simple RAG script to load a pdf file (~8 pages, that is not very large but maybe i'm wrong), at the first question i ask, i've the the error:
The message history is longer than the max chat context, length allowed, and we have run out of messages to drop.
Code:
I tried to change some variables but without any effect (max_context_tokens, max_output_tokens ...) even with a max context tokens at 32000.
Did i forget something or doing something wrong on the doc load ? Or my pdf is too large ?
Thanks for your work 👍
The text was updated successfully, but these errors were encountered: