Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

Open
sn0rkmaiden opened this issue Apr 1, 2023 · 1 comment

Comments

@sn0rkmaiden
Copy link

sn0rkmaiden commented Apr 1, 2023

I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:

tokenizer = AutoTokenizer.from_pretrained("my-repo/content")
model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)

encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)

For example predicted sequence is: "D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0".

with tokenizer.as_target_tokenizer(): decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0", return_tensors="pt", add_special_tokens=True).input_ids

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

So encoder_text length is 18, decoder_text length is 79.

For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.

model_view( cross_attention = outputs.cross_attentions, encoder_attention = encoder_attention, decoder_attention = decoder_attention, encoder_tokens = encoder_text, decoder_tokens = decoder_text)

Is the problem because the output length is different from the input?

@tkella47
Copy link

Using generate returns the auto-regressive attentions. Try using teacher forcing (ie providing labels/decoder_ids) through the forward pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants