Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

Open
shawn-z11 opened this issue Jan 17, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@shawn-z11
Copy link

I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.

However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.

I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.

Thank you for your assistance and support.
image

@shubhadeepd shubhadeepd added the enhancement New feature or request label Jan 18, 2024
@SartajHundal
Copy link

SartajHundal commented Feb 17, 2024

Have you tried abstraction or refactoring? Discourse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants