Any plan on upadating the code for LLaMA models? #128

iBibek · 2024-01-25T23:13:17Z

Thank you for the great repo.

Is there any plan from your side to update the code for LLaMA model? or is there anything I can do to update the codes to visualize LLaMA model?

Bachstelze · 2024-01-26T08:02:12Z

Doesn't it work as decoder model?
I have successfully run Mistral (with lots of redundant shortcuts). The architecture should be similar.

iBibek · 2024-01-28T18:42:05Z

@Bachstelze , this is good news.
Can you please share the code (if its possible then )?

Bachstelze · 2024-01-29T09:57:53Z

The code is similar to the GPT example in this repo:

from transformers import AutoTokenizer, AutoModel
from bertviz import head_view
from bertviz import model_view

# load the model
# Vicuna is an instruction-model based on Llama
model_name = "lmsys/vicuna-7b-delta-v1.1" # mistralai/Mistral-7B-Instruct-v0.1
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_attentions=True)

input_sentence = """The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.\n
Input: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n\nOutput:"""
input_sentence = "Generate a positive review for a place."
inputs = tokenizer.encode(input_sentence, return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0]) 

# save the complete model view
html_head_view = head_view(attention, tokens, html_action='return')
with open("all_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return')
with open("all_model_view.html", 'w') as file:
    file.write(html_model_view.data)

# save the view just for certain layers if the browser can't display the whole
# shorter inputs are easier to display
layers = [1]
html_head_view = head_view(attention, tokens, html_action='return', include_layers=layers)

with open("short_head_view.html", 'w') as file:
    file.write(html_head_view.data)

html_model_view = model_view(attention, tokens, html_action='return', include_layers=layers)
with open("short_model_view.html", 'w') as file:
    file.write(html_model_view.data)

The loading and processing already take 30 GB of RAM. My machine starts to swap at this point and i just save the html to visualize it after the RAM is free again.

The output looks very repetitive.

In the case of Vicuna (lmsys/vicuna-7b-delta-v1.1 From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning) all heads consist of the same weight shape. Every token can only attend to its previous tokens due to the one-directional objective of GPTS, e.g. the first token can only attend to the start of the sentence token and itself. Interestingly, every token equally shares its weights to all possible tokens. Therefore, the attention weights are strongest for the tokens at the beginning and decrease towards the end. This entails a "L" shape like the positive multiplicative inverse.

Let me know if you find other patterns or have a good explanation for this phenomenon

iBibek · 2024-01-30T02:33:10Z

@Bachstelze Thank you so much <3

MarioRicoIbanez · 2024-03-12T12:07:32Z

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

Icamd · 2024-04-24T15:25:18Z

Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method.

Have you solve the problem? Thank you!

MarioRicoIbanez · 2024-04-25T08:05:05Z

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

Icamd · 2024-04-27T14:47:04Z

Hi, I finally ended up using captum and it works perfectly!

https://captum.ai/tutorials/Llama2_LLM_Attribution

Thank you for the information! I find this works as well: https://github.com/mattneary/attention. I will try using captum, thank you!

Bachstelze · 2024-05-15T09:38:27Z

@Icamd Does https://github.com/mattneary/attention work well with bigger GPTs? Do you know how the attention weights are aggregated into one view?

@MarioRicoIbanez Can we use captum to view the attention pattern?

Bachstelze mentioned this issue Mar 18, 2024

Attention of instruction-tuned GPTs jwergieluk/revllm#2

Open

Bachstelze mentioned this issue May 15, 2024

Request for adding the transformers_neuron_view for LLAMA series models #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plan on upadating the code for LLaMA models? #128

Any plan on upadating the code for LLaMA models? #128

iBibek commented Jan 25, 2024

Bachstelze commented Jan 26, 2024

iBibek commented Jan 28, 2024

Bachstelze commented Jan 29, 2024

iBibek commented Jan 30, 2024

MarioRicoIbanez commented Mar 12, 2024

Icamd commented Apr 24, 2024

MarioRicoIbanez commented Apr 25, 2024

Icamd commented Apr 27, 2024

Bachstelze commented May 15, 2024

Any plan on upadating the code for LLaMA models? #128

Any plan on upadating the code for LLaMA models? #128

Comments

iBibek commented Jan 25, 2024

Bachstelze commented Jan 26, 2024

iBibek commented Jan 28, 2024

Bachstelze commented Jan 29, 2024

iBibek commented Jan 30, 2024

MarioRicoIbanez commented Mar 12, 2024

Icamd commented Apr 24, 2024

MarioRicoIbanez commented Apr 25, 2024

Icamd commented Apr 27, 2024

Bachstelze commented May 15, 2024