Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' #30631

Closed
2 of 4 tasks
EloiEynard opened this issue May 3, 2024 · 14 comments · Fixed by #30729

Comments

@EloiEynard
Copy link

EloiEynard commented May 3, 2024

System Info

  • transformers version: 4.41.0.dev0
  • Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.11.8
  • Huggingface_hub version: 0.20.3
  • Safetensors version: 0.4.2
  • Accelerate version: 0.28.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2+cu121 (True)
  • Tensorflow version (GPU?): 2.16.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

Not sure if this is an issue with the Trainer or the model.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following code is from the Idefics2 fine-tuning example colab with the addition of the compute_metrics in the Trainer.

!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q accelerate datasets peft bitsandbytes

import torch
from peft import LoraConfig
from transformers import AutoProcessor, BitsAndBytesConfig, Idefics2ForConditionalGeneration

DEVICE = "cuda:0"
USE_LORA = False
USE_QLORA = True


processor = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    do_image_splitting=False
)


# Three options for training, from the lowest precision training to the highest precision training:
# - QLora
# - Standard Lora
# - Full fine-tuning
if USE_QLORA or USE_LORA:
    lora_config = LoraConfig(
        r=8,
        lora_alpha=8,
        lora_dropout=0.1,
        target_modules='.*(text_model|modality_projection|perceiver_resampler).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$',
        use_dora=False if USE_QLORA else True,
        init_lora_weights="gaussian"
    )
    if USE_QLORA:
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16
        )
    model = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16,
        quantization_config=bnb_config if USE_QLORA else None,
    )
    model.add_adapter(lora_config)
    model.enable_adapters()
else:
    model = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16,
        _attn_implementation="flash_attention_2", # Only available on A100 or H100
    ).to(DEVICE)

from datasets import load_dataset

train_dataset = load_dataset("nielsr/docvqa_1200_examples", split="train")
train_dataset = train_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])

eval_dataset = load_dataset("nielsr/docvqa_1200_examples", split="test")
eval_dataset = eval_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])

import random

class MyDataCollator:
    def __init__(self, processor):
        self.processor = processor
        self.image_token_id = processor.tokenizer.additional_special_tokens_ids[
            processor.tokenizer.additional_special_tokens.index("<image>")
        ]

    def __call__(self, examples):
        texts = []
        images = []
        for example in examples:
            image = example["image"]
            question = example["query"]["en"]
            answer = random.choice(example["answers"])
            messages = [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": "Answer briefly."},
                        {"type": "image"},
                        {"type": "text", "text": question}
                    ]
                },
                {
                    "role": "assistant",
                    "content": [
                        {"type": "text", "text": answer}
                    ]
                }
            ]
            text = processor.apply_chat_template(messages, add_generation_prompt=False)
            texts.append(text.strip())
            images.append([image])

        batch = processor(text=texts, images=images, return_tensors="pt", padding=True)

        labels = batch["input_ids"].clone()
        labels[labels == processor.tokenizer.pad_token_id] = self.image_token_id
        batch["labels"] = labels

        return batch

data_collator = MyDataCollator(processor)

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    num_train_epochs=2,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=8,
    warmup_steps=50,
    learning_rate=1e-4,
    weight_decay=0.01,
    logging_steps=25,
    output_dir="/content/drive/My Drive/docvqa_ft_tutorial",
    save_strategy="steps",
    save_steps=250,
    save_total_limit=1,
    # evaluation_strategy="epoch",
    fp16=True,
    push_to_hub_model_id="idefics2-8b-docvqa-finetuned-tutorial",
    remove_unused_columns=False,
    report_to="none",
)

def custom_metrics(eval, preds):
    exit(0)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics = custom_metrics,
)

trainer.evaluate()

Here is the exception :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb Cell 36 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb#X50sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3693         batch_size = observed_batch_size
   3695 # Prediction step
-> 3696 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
   3697 main_input_name = getattr(self.model, "main_input_name", "input_ids")
   3698 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904), in Trainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys)
   3902     return (loss, None, None)
   3903 print(logits) #Eloi Remove
-> 3904 logits = nested_detach(logits)
   3905 if len(logits) == 1:
   3906     logits = logits[0]

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in nested_detach(tensors)
    188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
    189 if isinstance(tensors, (list, tuple)):
--> 190     return type(tensors)(nested_detach(t) for t in tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in <genexpr>(.0)
    188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
    189 if isinstance(tensors, (list, tuple)):
--> 190     return type(tensors)(nested_detach(t) for t in tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193), in nested_detach(tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})
--> 193 return tensors.detach()

AttributeError: 'DynamicCache' object has no attribute 'detach'

Seems to happend when the model's output's past_key_values are an empty DynamicCache.

Expected behavior

Should properly reach the custom_metrics and terminate cleanly.

@NielsRogge
Copy link
Contributor

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

@EloiEynard
Copy link
Author

I had the same error and fixed it by using model.config.use_cache=False during training

That fixes this issue as the past_key_values are now full tensors.
But leads to a new error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3714         logits = self.preprocess_logits_for_metrics(logits, labels)
   3715     logits = self.gather_function((logits))
-> 3716     all_preds.add(logits)
   3717 if labels is not None:
   3718     labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
    324     self.tensors = tensors if self.do_nested_concat else [tensors]
    325 elif self.do_nested_concat:
--> 326     self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
    327 else:
    328     self.tensors.append(tensors)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
    138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
--> 140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
    141 elif isinstance(tensors, Mapping):
    142     return type(tensors)(
    143         {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
    144     )

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
     96 tensor2 = atleast_1d(tensor2)
     98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99     return torch.cat((tensor1, tensor2), dim=0)
    101 # Let's figure out the new shape
    102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.

@NielsRogge
Copy link
Contributor

NielsRogge commented May 3, 2024

Yes this is due to batches having different lengths of input_ids (in the code snippet of your first message, you set padding=True which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.

It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True for instance), or by passing the flag eval_do_concat_batches=False to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics function rather than stacked tensors, so you would need to adapt your compute_metrics function accordingly.

@VictorSanh
Copy link
Member

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

I don't have a better fix!

@zucchini-nlp
Copy link
Member

I think the cache problem should be fixed by converting DynamicCache back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.

These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗

@amyeroberts
Copy link
Collaborator

Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?

@zucchini-nlp
Copy link
Member

We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄

@amyeroberts
Copy link
Collaborator

@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?

@zucchini-nlp
Copy link
Member

@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.

There are two options for this:

  1. The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
  2. The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add cache.to_legacy_cache() in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.

Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then

@amyeroberts
Copy link
Collaborator

@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.

@NielsRogge
Copy link
Contributor

NielsRogge commented May 8, 2024

Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb

@EloiEynard
Copy link
Author

Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.

By the way, if you don't mind me asking, I've noticed in your notebooks you use
model.add_adapter(lora_config)
model.enable_adapters()
Where I mostly used to see model = get_peft_model(model, lora_config)
Is there any difference between these two ? Thanks

@NielsRogge
Copy link
Contributor

NielsRogge commented May 8, 2024

I had the same question, turns out both are equivalent. The get_peft_model API is recommended as it returns a PeftModel which has additionally utility methods such as save_adapter() with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.

I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.

@EloiEynard
Copy link
Author

I see, thanks for the details !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants