RAG Implementation using RAPTOR with huggingface models #11966

Venpion · 2024-03-15T10:02:22Z

Venpion
Mar 15, 2024

I am following the example from the repo using the RAPTOR pack for RAG implementation. However, instead of the openAI models in the example, I want to use huggingface models (embed and LLM). I am having a issue with the executing the following

raptor_pack = RaptorPack(
documents,
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-large-en-v1.5"
), # used for embedding clusters
llm = HuggingFaceLLM(
model_name="TheBloke/Mistral-7B-Instruct-v0.2-GPTQ",
tokenizer_name="TheBloke/Mistral-7B-Instruct-v0.2-GPTQ",
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
context_window=3900,
max_new_tokens=256,
model_kwargs={"quantization_config": quantization_config},
tokenizer_kwargs={"max_length": 8000},
generate_kwargs={"temperature": 0.1, "do_sample": True},
device_map="auto"
), # used for generating summaries
vector_store=vector_store, # used for storage
similarity_top_k=2, # top k for each layer, or overall top-k for collapsed
mode="collapsed", # sets default mode
transformations=[
SentenceSplitter(chunk_size=400, chunk_overlap=50)
], # transformations applied for ingestion
)

error as follows

AttributeError Traceback (most recent call last)
/tmp/ipykernel_86/3501330495.py in <cell line: 11>()
14 model_name="BAAI/bge-large-en-v1.5"
15 ), # used for embedding clusters
---> 16 llm = HuggingFaceLLM(
17 model_name="TheBloke/Mistral-7B-Instruct-v0.2-GPTQ",
18 tokenizer_name="TheBloke/Mistral-7B-Instruct-v0.2-GPTQ",

~/.conda/envs/default/lib/python3.9/site-packages/llama_index/llms/huggingface/base.py in init(self, context_window, max_new_tokens, query_wrapper_prompt, tokenizer_name, model_name, model, tokenizer, device_map, stopping_ids, tokenizer_kwargs, tokenizer_outputs_to_remove, model_kwargs, generate_kwargs, is_chat_model, callback_manager, system_prompt, messages_to_prompt, completion_to_prompt, pydantic_program_mode, output_parser)
159 """Initialize params."""
160 model_kwargs = model_kwargs or {}
--> 161 self._model = model or AutoModelForCausalLM.from_pretrained(
162 model_name, device_map=device_map, **model_kwargs
163 )

~/.conda/envs/default/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
559 elif type(config) in cls._model_mapping.keys():
560 model_class = _get_model_class(config, cls._model_mapping)
--> 561 return model_class.from_pretrained(
562 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
563 )

~/.conda/envs/default/lib/python3.9/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3012 if pre_quantized or quantization_config is not None:
3013 if pre_quantized:
-> 3014 config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
3015 config.quantization_config, quantization_config
3016 )

~/.conda/envs/default/lib/python3.9/site-packages/transformers/quantizers/auto.py in merge_quantization_configs(cls, quantization_config, quantization_config_from_args)
147 if isinstance(quantization_config, (GPTQConfig, AwqConfig)) and quantization_config_from_args is not None:
148 # special case for GPTQ / AWQ config collision
--> 149 loading_attr_dict = quantization_config_from_args.get_loading_attributes()
150 for attr, val in loading_attr_dict.items():
151 setattr(quantization_config, attr, val)

AttributeError: 'BitsAndBytesConfig' object has no attribute 'get_loading_attributes'

any help will be highly appreciated. Thanks

@Venpion · 2024-03-15T10:02:36Z

dosubot[bot]
bot Mar 15, 2024

Hey @Venpion, great to see you back! Ran into a bit of a snag, huh? Let's see what we can do about it this time.

The error you're encountering seems to be related to the quantization_config parameter you're passing to the HuggingFaceLLM class. The error message suggests that the BitsAndBytesConfig object you're using does not have a get_loading_attributes method, which is expected by the AutoHfQuantizer class's merge_quantization_configs method.

Unfortunately, I couldn't find specific references to BitsAndBytesConfig or AutoHfQuantizer within the LlamaIndex repository. It's possible that these classes are part of the Hugging Face Transformers library or another library you're using.

As a first step, I would recommend checking the documentation or source code for the BitsAndBytesConfig and AutoHfQuantizer classes to understand what methods they provide and what they expect. If BitsAndBytesConfig is supposed to have a get_loading_attributes method, there might be a version mismatch or a bug in the library you're using.

If you're unable to resolve the issue by checking the documentation or source code for these classes, it might be helpful to provide more information about the BitsAndBytesConfig and AutoHfQuantizer classes, such as which library they come from and what versions you're using. This could help identify whether the issue is due to a version mismatch, a bug in the library, or a misunderstanding of how to use these classes.

In the meantime, you could try removing or changing the quantization_config parameter to see if that resolves the issue. This could help determine whether the issue is specifically related to the quantization_config parameter or if it's part of a larger problem.

I hope this helps, and please let me know if you have any other questions or if there's more information I can provide to help resolve this issue. Dosu

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG Implementation using RAPTOR with huggingface models #11966

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

About Dosu

Select a reply

RAG Implementation using RAPTOR with huggingface models #11966

Venpion Mar 15, 2024

Replies: 1 comment

dosubot[bot] bot Mar 15, 2024

Sources

About Dosu

Venpion
Mar 15, 2024

dosubot[bot]
bot Mar 15, 2024