Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get max tokens for LLM with name XXX (Ollama provider) #1342

Open
CosmicMac opened this issue Apr 17, 2024 · 4 comments
Open

Failed to get max tokens for LLM with name XXX (Ollama provider) #1342

CosmicMac opened this issue Apr 17, 2024 · 4 comments

Comments

@CosmicMac
Copy link

Hi,
I'm facing the following issue when trying to chat with Ollama:

04/17/2024 01:13:07 PM             utils.py 273 : Failed to get max tokens for LLM with name gemma. Defaulting to 4096.
Traceback (most recent call last):
  File "/app/danswer/llm/utils.py", line 263, in get_llm_max_tokens
    model_obj = model_map[f"{model_provider}/{model_name}"]
                ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'ollama_chat/gemma'

Then Danswer answer :)

Error in input stream

My .env:

GEN_AI_MODEL_PROVIDER=ollama_chat
GEN_AI_MODEL_VERSION=gemma
GEN_AI_API_ENDPOINT=http://host.docker.internal:11434

QA_TIMEOUT=120
DISABLE_LLM_CHOOSE_SEARCH=True
DISABLE_LLM_CHUNK_FILTER=True
DISABLE_LLM_QUERY_REPHRASE=True
DISABLE_LLM_FILTER_EXTRACTION=True
# QA_PROMPT_OVERRIDE=weak

Ollama is up and running, tested from inside danswer-stack-api_server container with curl http://host.docker.internal:11434/api/tags (obviously I had to install curl first).

BTW, Danswer seems to replay the request once in case of error.

@gargmukku07
Copy link

Hi,

Same error i am getting using the latest build. I am using the llama2.

@gargmukku07
Copy link

This is fixed by setting the value of GEN_AI_MAX_TOKENS

@al-lac
Copy link

al-lac commented Apr 26, 2024

@gargmukku07 which value did you set for llama2?

@LittleBennos
Copy link

just wanted to add my findings

I was getting this error

05/28/2024 11:34:41 PM utils.py 328 : Failed to get max tokens for LLM with name azuregpt35turbo. Defaulting to 4096.
Traceback (most recent call last):
File "/app/danswer/llm/utils.py", line 318, in get_llm_max_tokens
model_obj = model_map[model_name]
~~~~~~~~~^^^^^^^^^^^^
KeyError: 'azuregpt35turbo'
05/28/2024 11:34:46 PM timing.py 74 : stream_chat_message took 7.445417404174805 seconds

turns out you need to set a variable for the GEN_AI_MAX_TOKENS

this is due to this section of code in backend/danswer/llm/utils.py

) -> int:
"""Best effort attempt to get the max tokens for the LLM"""
if GEN_AI_MAX_TOKENS:
# This is an override, so always return this
return GEN_AI_MAX_TOKENS

try:
    model_obj = model_map.get(f"{model_provider}/{model_name}")
    if not model_obj:
        model_obj = model_map[model_name]

    if "max_input_tokens" in model_obj:
        return model_obj["max_input_tokens"]

    if "max_tokens" in model_obj:
        return model_obj["max_tokens"]

    raise RuntimeError("No max tokens found for LLM")
except Exception:
    logger.exception(
        f"Failed to get max tokens for LLM with name {model_name}. Defaulting to 4096."
    )
    return 4096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants