How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

sn0rkmaiden · 2023-04-01T16:01:33Z

I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:

tokenizer = AutoTokenizer.from_pretrained("my-repo/content")
model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)

encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)

For example predicted sequence is: "D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0".

with tokenizer.as_target_tokenizer(): decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0", return_tensors="pt", add_special_tokens=True).input_ids

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

So encoder_text length is 18, decoder_text length is 79.

For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.

model_view( cross_attention = outputs.cross_attentions, encoder_attention = encoder_attention, decoder_attention = decoder_attention, encoder_tokens = encoder_text, decoder_tokens = decoder_text)

Is the problem because the output length is different from the input?

The text was updated successfully, but these errors were encountered:

tkella47 · 2023-08-10T16:12:21Z

Using generate returns the auto-regressive attentions. Try using teacher forcing (ie providing labels/decoder_ids) through the forward pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

sn0rkmaiden commented Apr 1, 2023 •

edited

tkella47 commented Aug 10, 2023

How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

How to visualize attention if the sizes of input and output sequence are different? ValueError. #116

Comments

sn0rkmaiden commented Apr 1, 2023 • edited

tkella47 commented Aug 10, 2023

sn0rkmaiden commented Apr 1, 2023 •

edited