Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BART generate with min_new_tokens exceeds maximum length #30759

Closed
2 of 4 tasks
vsocrates opened this issue May 11, 2024 · 4 comments
Closed
2 of 4 tasks

BART generate with min_new_tokens exceeds maximum length #30759

vsocrates opened this issue May 11, 2024 · 4 comments

Comments

@vsocrates
Copy link

System Info

  • transformers version: 4.40.2
  • Platform: Linux-4.18.0-477.36.1.el8_8.x86_64-x86_64-with-glibc2.28
  • Python version: 3.10.14
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.3
  • Accelerate version: 0.30.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker @younesbelkada @gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

If I load a fine-tuned BARTforConditionalGeneration model and then try to generate text with it, I run into the following error: This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

Generation code:

outputs = model.generate(input_ids, attention_mask=attention_mask, num_beams=3, 
                         min_new_tokens=1500,
                         max_new_tokens=2500,
                         # stopping_criteria=stopping_criteria,
                         early_stopping=True)

I was under the impression that since the BART decoder generates autoregressively, there was no limit to its generation?

Expected behavior

Generation without a CUDA or out-of-bounds error with arbitrary length.

@younesbelkada
Copy link
Contributor

Hi @vsocrates
Thanks for the issue !
You are getting that warning because the model's maximum positional embedding stops at 1024 tokens, some models have fixed positional embeddings where you can't exceed the maximum number of tokens by design (e.g. by having a nn.Embedding layer), for some models it is possible to exceed that, at your own risk as the model has not been trained to exceed that many tokens. If you are getting consistent / nice generations, I would say that there is nothing to worry about, otherwise you might need to use other models that support longer context length

@vsocrates
Copy link
Author

vsocrates commented May 13, 2024

Understood! Went through some other issues and and it looks like T5 might use relative position embeddings so in theory, should be able to extend beyond its max context length (512 tokens), but potentially with some loss of accuracy/weird generations, is that correct?

@younesbelkada
Copy link
Contributor

Yes this is correct, from my experience with flan-T5 that was possible but with some potential loss of accuracy / unconsistent generation

@vsocrates
Copy link
Author

vsocrates commented May 14, 2024

Great, thanks, closing this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants