Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the value of loss is too unstable when supervised-finetune the 7b-100k-ft model #168

Open
seanxuu opened this issue Jan 18, 2024 · 1 comment

Comments

@seanxuu
Copy link

seanxuu commented Jan 18, 2024

when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable.
my command is :

Miniconda/envs/longlora/bin/python -u supervised-fine-tune.py 
--model_name_or_path models/LongAlpaca-7B 
--bf16 True 
--output_dir LongLoRA/save/LongAlpaca-7B-origdata 
--model_max_length 32768 
--use_flash_attn True 
--data_path data/LongAlpaca-12k.json 
--low_rank_training True 
--num_train_epochs 3 
--per_device_train_batch_size 1
 --per_device_eval_batch_size 2 
--gradient_accumulation_steps 1 
--evaluation_strategy no 
--save_strategy steps 
--save_steps 1000 
--save_total_limit 2 
--learning_rate 2e-5 
--weight_decay 0.0 
--warmup_steps 20 
--lr_scheduler_type constant_with_warmup 
--logging_steps 1 
--deepspeed ds_configs/stage2.json 
--tf32 True

the value of loss looks like below:

image

@seanxuu
Copy link
Author

seanxuu commented Jan 19, 2024

I try to train Llama-2-7b-longlora-100k-ft with my own dataset which is sampled from your LongAlpaca-12k.json data. But the value of loss looks same.
image

python supervised-fine-tune.py  \
        --model_name_or_path /models/Llama-2-7b-longlora-100k-ft \
        --bf16 True \
        --output_dir LongLoRA/save/7b-100k-ft-origdata-mydata       \
        --model_max_length 100000 \
        --use_flash_attn True \
        --data_path LongLoRA/pdf2txt/output/manual_data.json \
        --low_rank_training True \
        --num_train_epochs 5  \
        --per_device_train_batch_size 1     \
        --per_device_eval_batch_size 2     \
        --gradient_accumulation_steps 8     \
        --evaluation_strategy "no"     \
        --save_strategy "steps"     \
        --save_steps 98     \
        --save_total_limit 2     \
        --learning_rate 2e-5     \
        --weight_decay 0.0     \
        --warmup_steps 20     \
        --lr_scheduler_type "constant_with_warmup"     \
        --logging_steps 1     \
        --deepspeed "ds_configs/stage2.json" \
        --tf32 True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant