Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

shawn-z11 · 2024-01-17T06:13:19Z

I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.

However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.

I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.

Thank you for your assistance and support.

SartajHundal · 2024-02-17T15:55:00Z

Have you tried abstraction or refactoring? Discourse

shubhadeepd assigned sumitkbh Jan 18, 2024

shubhadeepd added the enhancement New feature or request label Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

shawn-z11 commented Jan 17, 2024

SartajHundal commented Feb 17, 2024 •

edited

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

Comments

shawn-z11 commented Jan 17, 2024

SartajHundal commented Feb 17, 2024 • edited

SartajHundal commented Feb 17, 2024 •

edited