Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing paper results #34

Open
grigorn opened this issue Nov 13, 2023 · 6 comments
Open

Reproducing paper results #34

grigorn opened this issue Nov 13, 2023 · 6 comments

Comments

@grigorn
Copy link

grigorn commented Nov 13, 2023

I run LLM pruner with the command specified in the ReadMe to prune LLama-7B

python hf_prune.py --pruning_ratio 0.25 \
      --block_wise \
      --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
      --block_attention_layer_start 4 --block_attention_layer_end 30 \
      --pruner_type taylor \
      --test_after_train \
      --device cpu  --eval_device cuda \
      --save_ckpt_log_name llama_prune

I get the following results

#Param before: 6738415616, #Param after: 5422977024, Ratio = 80.4785%
PPL after pruning: {'wikitext2': 19.96819234893607, 'ptb': 80.37625124290746}

Perplexities reported in Table 1 in the paper are WikiText2 - 19.09 and PTB - 34.21. Is there any reason for the difference in thses perplexities especially PTB? Thanks

@horseee
Copy link
Owner

horseee commented Nov 14, 2023

Hi. Can I check which LLaMa-7B checkpoint you use? decapoda-research/llama-7b-hf in my code is not available currently and I'm not sure if it is the reason that causes this difference.

@grigorn
Copy link
Author

grigorn commented Nov 14, 2023

I am using 'yahma/llama-7b-hf'

@horseee
Copy link
Owner

horseee commented Nov 19, 2023

Have you tried the copied version of decapoda-research/llama-7b-hf, e.g., https://huggingface.co/baffo32/decapoda-research-llama-7B-hf?

We would try that kind of checkpoint these days to see if the results are reproducible in those available checkpoints.

@grigorn
Copy link
Author

grigorn commented Nov 20, 2023

With the checkpoint you specified, I could replicate the metrics. Do you know what is the difference between those 2? I thought there is one LLama and the checkpoints should be the same

@horseee
Copy link
Owner

horseee commented Nov 20, 2023

I have no idea about this😢.
I guess the possible reasons may be: (1) the EOS token issue or (2) the weight between these two is slightly different.

@grigorn
Copy link
Author

grigorn commented Nov 23, 2023

I checked both the model and the tokenizer. Model weights and tokenizer.get_vocab() are the same, but there is the difference of special tokens - for baffo32 all three special tokens are empty strings. Can this be reason of these differences? If yes, do you know which one is the "true" LLama?
Screenshot 2023-11-23 at 17 40 53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants