Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking token usage? #29

Open
oneilsh opened this issue Nov 29, 2023 · 2 comments
Open

Tracking token usage? #29

oneilsh opened this issue Nov 29, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@oneilsh
Copy link

oneilsh commented Nov 29, 2023

I see that the API supports .message_token_len() for an individual ChatMessage; it would be nice to be able query total token usage over the course of a conversation for cost tracking purposes.

I'm not entirely sure the best way to handle it - maybe like a .next_message_tokens_cost(message: ChatMessage) that would return the total prompt tokens (system + function defs + chat history) plus the tokens in message that would be incurred? If it could be done over the course of a chat (accumulating after each full round) maybe something like .conversation_history_total_prompt_tokens() and .conversation_history_total_response_tokens() so a user could compute a running chat cost?

Thanks for considering, and for developing Kani! It really is the 'right' API interface to tool-enabled LLMs in my opinion :)

@zhudotexe
Copy link
Owner

Thanks for the kind words! I should note that often times the internals of the LLM providers (in particular OpenAI) are a bit of a mystery, so Kani's token counting is really just a best guess to within a couple of percent.

You have a couple options if you want to track tokens as accurately as possible, which I'll lay out here:

  1. Overriding Kani.get_model_completion - this is the method that the Kani instance uses to go the underlying LLM, and it returns a Completion, which includes the prompt token len and completion token len as returned by the engine. You could, for example, add tokens_used_prompt and tokens_used_completion attributes in a subclass of Kani and increment those after a super call; this has the disadvantage of being post-hoc counting though. I use a similar approach in one of my projects here: https://github.com/zhudotexe/kanpai/blob/cc603705d353e4e9b9aa3cf9fbb12e3a46652c55/kanpai/base_kani.py#L48
    1. You could also use an estimation like sum(self.message_token_len(m) for m in await self.get_prompt()) + self.engine.token_reserve + self.engine.function_token_reserve(list(self.functions.values())) if you wanted a token estimation before sending it to the LLM. The instance caches the token lengths so this won't result in a major slowdown.
  2. Use an external gateway - in our lab we've been trying out Helicone for token counting. If you're using OpenAI, you can integrate it with Kani pretty easily by specifying the api_base and headers when constructing an OpenAIEngine. I've also been interested in Cloudflare AI Gateway, though I haven't used it yet. These solutions also require a bit more engineering though, and I believe they're also only post-hoc.

I'll have to think a bit more about how to implement an official token counting interface if we decide to - maybe Kani.prompt_len_estimate(msgs: list[ChatMessage]) -> int to perform the estimation detailed above?

@zhudotexe zhudotexe added the enhancement New feature or request label Nov 29, 2023
@oneilsh
Copy link
Author

oneilsh commented Dec 8, 2023

Wonderful, thank you! Post-hoc is fine for my case, I used your first suggestion and it works great. I did need to remember to update the counts manually when calling out to sub-kanis. (Maybe engine-level counting?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants