You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.
However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.
I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.
Thank you for your assistance and support.
The text was updated successfully, but these errors were encountered:
I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.
However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.
I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.
Thank you for your assistance and support.
The text was updated successfully, but these errors were encountered: