Tracking token usage? #29

oneilsh · 2023-11-29T00:19:23Z

I see that the API supports .message_token_len() for an individual ChatMessage; it would be nice to be able query total token usage over the course of a conversation for cost tracking purposes.

I'm not entirely sure the best way to handle it - maybe like a .next_message_tokens_cost(message: ChatMessage) that would return the total prompt tokens (system + function defs + chat history) plus the tokens in message that would be incurred? If it could be done over the course of a chat (accumulating after each full round) maybe something like .conversation_history_total_prompt_tokens() and .conversation_history_total_response_tokens() so a user could compute a running chat cost?

Thanks for considering, and for developing Kani! It really is the 'right' API interface to tool-enabled LLMs in my opinion :)

The text was updated successfully, but these errors were encountered:

zhudotexe · 2023-11-29T03:21:26Z

Thanks for the kind words! I should note that often times the internals of the LLM providers (in particular OpenAI) are a bit of a mystery, so Kani's token counting is really just a best guess to within a couple of percent.

You have a couple options if you want to track tokens as accurately as possible, which I'll lay out here:

Overriding Kani.get_model_completion - this is the method that the Kani instance uses to go the underlying LLM, and it returns a Completion, which includes the prompt token len and completion token len as returned by the engine. You could, for example, add tokens_used_prompt and tokens_used_completion attributes in a subclass of Kani and increment those after a super call; this has the disadvantage of being post-hoc counting though. I use a similar approach in one of my projects here: https://github.com/zhudotexe/kanpai/blob/cc603705d353e4e9b9aa3cf9fbb12e3a46652c55/kanpai/base_kani.py#L48
1. You could also use an estimation like sum(self.message_token_len(m) for m in await self.get_prompt()) + self.engine.token_reserve + self.engine.function_token_reserve(list(self.functions.values())) if you wanted a token estimation before sending it to the LLM. The instance caches the token lengths so this won't result in a major slowdown.
Use an external gateway - in our lab we've been trying out Helicone for token counting. If you're using OpenAI, you can integrate it with Kani pretty easily by specifying the api_base and headers when constructing an OpenAIEngine. I've also been interested in Cloudflare AI Gateway, though I haven't used it yet. These solutions also require a bit more engineering though, and I believe they're also only post-hoc.

I'll have to think a bit more about how to implement an official token counting interface if we decide to - maybe Kani.prompt_len_estimate(msgs: list[ChatMessage]) -> int to perform the estimation detailed above?

oneilsh · 2023-12-08T21:32:49Z

Wonderful, thank you! Post-hoc is fine for my case, I used your first suggestion and it works great. I did need to remember to update the counts manually when calling out to sub-kanis. (Maybe engine-level counting?)

zhudotexe added the enhancement New feature or request label Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking token usage? #29

Tracking token usage? #29

oneilsh commented Nov 29, 2023

zhudotexe commented Nov 29, 2023

oneilsh commented Dec 8, 2023

Tracking token usage? #29

Tracking token usage? #29

Comments

oneilsh commented Nov 29, 2023

zhudotexe commented Nov 29, 2023

oneilsh commented Dec 8, 2023