Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] did not output the eval results at all. #815

Open
xigua314 opened this issue May 8, 2024 · 3 comments
Open

[BUG] did not output the eval results at all. #815

xigua314 opened this issue May 8, 2024 · 3 comments

Comments

@xigua314
Copy link

xigua314 commented May 8, 2024

I went through the process of Finetuning (Full) gpt2, and I set --do_eval --eval_dataset_path xxx.json --eval_steps, where xxx.json is text2text, and the finetune process did not output the eval results at all. My finetune steps exceeded the eval_steps. I don't know if this is a bug or if there is a problem with my settings. I look forward to your answer, thank you very much!
Here is my detailed script setting,

deepspeed ${deepspeed_args}
examples/finetune.py
--model_name_or_path ${model_name_or_path}
--trust_remote_code ${trust_remote_code}
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--conversation_template ${conversation_template}
--num_train_epochs 0.1
--learning_rate 2e-5
--disable_group_texts 1
--block_size 1024
--per_device_train_batch_size 18
--deepspeed configs/ds_config_zero3.json
--fp16
--run_name finetune
--validation_split_percentage 20
--eval_steps 20
--logging_steps 20
--do_train
--do_eval
--eval_dataset_path /h/s/x/l/eval
--ddp_timeout 72000
--save_steps 5000
--dataloader_num_workers 1
| tee ${log_dir}/train.log
2> ${log_dir}/train.err

Here is the last part of the log during my fine-tuning process,

05/08/2024 10:23:34 - WARNING - lmflow.pipeline.finetuner - finetuner_args.do_evalTrue
05/08/2024 10:23:34 - WARNING - lmflow.pipeline.finetuner - *************************************************************
[2024-05-08 10:23:38,301] [INFO] [partition_parameters.py:326:exit] finished initializing model with 1.64B parameters
05/08/2024 10:23:39 - WARNING - lmflow.pipeline.finetuner - in finetuner_args.do_eval ******************
05/08/2024 10:23:40 - WARNING - lmflow.pipeline.finetuner - ********************************************************************************
05/08/2024 10:23:40 - WARNING - lmflow.pipeline.finetuner - Number of eval samples: 256
ninja: no work to do.
Time to load cpu_adam op: 2.875669002532959 seconds
Parameter Offload: Total persistent parameters: 1001600 in 386 params
{'loss': 0.2962, 'grad_norm': 3.1811087335274615, 'learning_rate': 1.5714285714285715e-05, 'epoch': 0.02}
{'loss': 0.2991, 'grad_norm': 2.5313646679089503, 'learning_rate': 1.0952380952380955e-05, 'epoch': 0.05}
{'loss': 0.3155, 'grad_norm': 2.1892666086453594, 'learning_rate': 6.1904761904761914e-06, 'epoch': 0.07}
{'loss': 0.2972, 'grad_norm': 2.2230829824820884, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.1}
{'train_runtime': 451.4452, 'train_samples_per_second': 3.316, 'train_steps_per_second': 0.186, 'train_loss': 0.30037035260881695, 'epoch': 0.1}
***** train metrics *****
epoch = 0.101
total_flos = 4229GF
train_loss = 0.3004
train_runtime = 0:07:31.44
train_samples = 14972
train_samples_per_second = 3.316
train_steps_per_second = 0.186

@research4pan
Copy link
Contributor

research4pan commented May 8, 2024

Thanks for your interest in LMFlow! You may try --eval_strategy steps (https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L237) and --eval_steps 1 (https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L411) to see if it works. Hope this information can be helpful 😄

@xigua314
Copy link
Author

xigua314 commented May 8, 2024

“Thank you very much for your reply. I tried adding --eval_strategy steps to the script and modified '--eval_steps', '1', but in the end, it reported an error: ValueError: Some specified arguments are not used by the HfArgumentParser: ['--eval_strategy', 'steps'].

Thanks for your interest in LMFlow! You may try --eval_strategy steps (https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L237) and --eval_steps 1 (https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L411) to see if it works. Hope this information can be helpful 😄

@research4pan
Copy link
Contributor

research4pan commented May 9, 2024

That's a bit strange. It would be nice if you could share your transformers version so we can check that for you? The argument is passed to huggingface trainer so it is expected be accepted normally:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants