Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

跑最后一步报这个警告,要怎么改超参数呢 #2

Open
greatheart1000 opened this issue Apr 27, 2023 · 3 comments
Open

跑最后一步报这个警告,要怎么改超参数呢 #2

greatheart1000 opened this issue Apr 27, 2023 · 3 comments

Comments

@greatheart1000
Copy link

/root/miniconda3/envs/Vicuna/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.00 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters

@jackaduma
Copy link
Owner

/root/miniconda3/envs/Vicuna/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.00 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters

你前面执行都是ok的 对吧。
最后一步 能不能 发更多上下文信息呢

@skepsun
Copy link

skepsun commented May 9, 2023

最后一步的代码里lora_config需要加上target_modules,trl作者给出的设置是target_modules=["q_proj","k_proj"]。训练可以跑通,但是会出现kl散度为负数的情况:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so.11.0'), PosixPath('/d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
script_args:  ScriptArguments(model_name='./lora-Vicuna-adapter-merged', tokenizer_name='decapoda-research/llama-7b-hf', reward_model_name='./reward_model_vicuna-7b_100_2e-05/-adapter-merged', log_with=None, learning_rate=1.4e-05, output_max_length=128, mini_batch_size=1, batch_size=1, ppo_epochs=1, gradient_accumulation_steps=1, adafactor=False, early_stopping=True, target_kl=0.1, reward_baseline=0.0, batched_gen=True, save_freq=100, output_dir='./tuning_llama_rl_checkpoints', seed=0)
dataset_name:  ./datasets/
Found cached dataset json (/home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Found cached dataset json (/home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Loading cached processed dataset at /home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-48f7acbd9bfe6356_*_of_00024.arrow
Loading cached processed dataset at /home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cac78464697791ef.arrow
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.53s/it]
finetune model:  ./lora-Vicuna-adapter-merged <class 'trl.models.modeling_value_head.AutoModelForCausalLMWithValueHead'>
finetune model's is_loaded_in_8bit:  True
device:  0
reward_model_name:  ./reward_model_vicuna-7b_100_2e-05/-adapter-merged
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.69s/it]
Some weights of the model checkpoint at ./reward_model_vicuna-7b_100_2e-05/-adapter-merged were not used when initializing LlamaForSequenceClassification: ['lm_head.weight']
- This IS expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at ./reward_model_vicuna-7b_100_2e-05/-adapter-merged and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
0it [00:00, ?it/s]/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/generation/utils.py:1253: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar funcionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.
  warnings.warn(
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
1it [00:11, 11.81s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.25 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
7it [01:23, 12.09s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.05 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
10it [01:56, 11.94s/it]/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/pipelines/base.py:1080: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  warnings.warn(
11it [02:10, 12.41s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -1.07 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
12it [02:25, 13.15s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.82 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
16it [03:12, 12.10s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -1.02 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(

尝试将generation_kwargs里的eos_token_id设置为-1,以及训练时生成回答前让ppo_trainer.model.eval()(PPO step时改回train),都无法解决问题。

@jerry1993-tech
Copy link

最后一步的代码里lora_config需要加上target_modules,trl作者给出的设置是target_modules=["q_proj","k_proj"]。训练可以跑通,但是会出现kl散度为负数的情况:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so.11.0'), PosixPath('/d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /d1/data/chuxiong/miniconda3/envs/llm/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
script_args:  ScriptArguments(model_name='./lora-Vicuna-adapter-merged', tokenizer_name='decapoda-research/llama-7b-hf', reward_model_name='./reward_model_vicuna-7b_100_2e-05/-adapter-merged', log_with=None, learning_rate=1.4e-05, output_max_length=128, mini_batch_size=1, batch_size=1, ppo_epochs=1, gradient_accumulation_steps=1, adafactor=False, early_stopping=True, target_kl=0.1, reward_baseline=0.0, batched_gen=True, save_freq=100, output_dir='./tuning_llama_rl_checkpoints', seed=0)
dataset_name:  ./datasets/
Found cached dataset json (/home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Found cached dataset json (/home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Loading cached processed dataset at /home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-48f7acbd9bfe6356_*_of_00024.arrow
Loading cached processed dataset at /home/chuxiong/.cache/huggingface/datasets/json/datasets-09bc2f5b6c26f79a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cac78464697791ef.arrow
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.53s/it]
finetune model:  ./lora-Vicuna-adapter-merged <class 'trl.models.modeling_value_head.AutoModelForCausalLMWithValueHead'>
finetune model's is_loaded_in_8bit:  True
device:  0
reward_model_name:  ./reward_model_vicuna-7b_100_2e-05/-adapter-merged
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.69s/it]
Some weights of the model checkpoint at ./reward_model_vicuna-7b_100_2e-05/-adapter-merged were not used when initializing LlamaForSequenceClassification: ['lm_head.weight']
- This IS expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at ./reward_model_vicuna-7b_100_2e-05/-adapter-merged and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
0it [00:00, ?it/s]/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/generation/utils.py:1253: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar funcionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.
  warnings.warn(
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
1it [00:11, 11.81s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.25 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
7it [01:23, 12.09s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.05 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
10it [01:56, 11.94s/it]/d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/transformers/pipelines/base.py:1080: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  warnings.warn(
11it [02:10, 12.41s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -1.07 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
12it [02:25, 13.15s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.82 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(
16it [03:12, 12.10s/it]/d1/data/chuxiong/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -1.02 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
  warnings.warn(

尝试将generation_kwargs里的eos_token_id设置为-1,以及训练时生成回答前让ppo_trainer.model.eval()(PPO step时改回train),都无法解决问题。

你好 ,想问一下你的python环境的包的版本是多少:
accelerate==?
peft==?
transformers==?
torch==?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants