New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Token Counter is not working when transformations are applied with an embedding model. #13146
Comments
It seems the issue with To address this, ensure the payload for embedding events includes Reviewing the integration of the embedding model and transformations to maintain the expected data structure and flow is crucial. Debugging with logging or debugging statements within the handling of embedding events can help identify where the process might be failing. By closely examining these areas, you should be able to pinpoint and resolve the issue causing the token counter to report zero in the presence of embedding transformations.
|
@dsanr you should attach the callback manager to the embedding model (and also the LLM tbh)
|
@logan-markewich Shouldn't be using Settings.callback_manager work? |
Bug Description
Token Counter is not working when transformations are applied along with an embedding model. So, even the Ingestion Pipeline is not useful if we want to use the Token Counter.
Version
0.10.33
Steps to Reproduce
documents = SimpleDirectoryReader(input_dir="./data", filename_as_id=True).load_data(show_progress=True)
token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode, verbose=True)
Settings.callback_manager = CallbackManager([token_counter])
transformations = [SentenceSplitter(chunk_size=512, chunk_overlap=128), OpenAIEmbedding()]
index = VectorStoreIndex.from_documents(documents, transformations=transformations)
token_counter.total_embedding_token_count
is returning zero.If I remove OpenAIEmbedding() from transformations and pass it in
VectorStoreIndex.from_documents()
, then token counter is working.Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: