[Question]: max_token auto adjust for completion requests but not chat requests #8437
Replies: 2 comments
-
🤖 Hello, You're correct in your observation that the LlamaIndex framework automatically adjusts the max token limit for completion requests but not for chat requests in GPT-3 Turbo. This is due to the different ways these two types of requests handle tokens. In the provided code, the model_name = llm_metadata.model_name
if is_chat_model(model_name):
context_window -= 150 This adjustment is not made for completion requests because they handle tokens differently. In a chat model, each message and role in the conversation consumes tokens, including system, user, and assistant messages. This is why the In contrast, completion requests do not have this additional token consumption, so there is no need to adjust the The The OpenAI class in the LlamaIndex framework also has a method I hope this clarifies your question. If you have any further questions, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
To clarify on the specific situation we were encountering this problem: we were using TreeSummarize on doucments with GPT 3.5 turbo and midway through summarizing a document we would encounter this issue of too many tokens being used for a request. |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
I recently ran into an issue where our gpt 3 turbo requests were producing too many tokens which would put us over the limit causing an error. After looking into solutions i noticed that completions automatically adjust the max token limit for the request while chat requests don't.
I was curious if i was missing some alternative way of handling this case or if there was a specific reason why the functionality existed for one version but not the other.
Beta Was this translation helpful? Give feedback.
All reactions