Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于上次提的问题#691,后续改进后似乎依旧不能按微调的情况回复。 #880

Open
xiaolvtongxue-zt opened this issue May 8, 2024 · 3 comments

Comments

@xiaolvtongxue-zt
Copy link

xiaolvtongxue-zt commented May 8, 2024

基于上次提的问题#691,后续改进后似乎依旧不能按微调的情况回复。

量化模型的merge lora精度损失很大的 建议训完后量化,然后使用vllm

您好,根据上次的反馈,我在这次使用的微调中,直接使用lora微调,并没有使用量化。quantization_bit = 0;
训练模型后,将模型进行合并(为了后续可以使用VLLM进行推理。)

CUDA_VISIBLE_DEVICES=0,1 swift export --ckpt_dir './swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69500' --sft_type 'lora' --merge_lora true --model_id_or_path './models/models/qwen/Qwen1___5-7B-Chat'

此时发现,这样做,模型依旧未能按照实际微调的结果(即未合并前的模型)来进行回复。
想请问老师,到底时哪一步出现问题了?很是奇怪。

@tastelikefeet
Copy link
Collaborator

不merge就没问题?merge了就不太行?

@xiaolvtongxue-zt
Copy link
Author

不merge就没问题?merge了就不太行?

是的,之前是使用看量化微调,所以不行,可以理解,但后续取消了量化微调,merge了依旧不行?着实有点想不明白。

为了方便你查看,我将把我merge过程中的log输出展示出来:“

run sh: `python /root/anaconda3/envs/python3.9/lib/python3.9/site-packages/swift/cli/export.py --ckpt_dir ./swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000 --sft_type lora --merge_lora true --model_id_or_path ./models/models/qwen/Qwen1___5-7B-Chat`
2024-05-09 14:29:08,784 - modelscope - INFO - PyTorch version 2.1.2 Found.
2024-05-09 14:29:08,785 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-09 14:29:08,839 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 e31c85192ca44a9fd57e7f94f1a1180d and a total number of 976 components indexed
[INFO:swift] Start time of running main: 2024-05-09 14:29:09.570211
[INFO:swift] ckpt_dir: /home/centos/xiaolv/太安模型微调/swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000
[INFO:swift] Setting model_info['revision']: master
[INFO:swift] Setting self.eval_human: True
[INFO:swift] Setting overwrite_generation_config: True
[INFO:swift] Setting args.dataset: ['alpaca-zh', 'alpaca-en']
[INFO:swift] args: ExportArguments(model_type='qwen1half-7b-chat', model_id_or_path='./models/models/qwen/Qwen1___5-7B-Chat', model_revision='master', sft_type='lora', template_type='qwen', infer_backend='vllm', ckpt_dir='/home/centos/xiaolv/太安模型微调/swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000', load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='bf16', dataset=['alpaca-zh', 'alpaca-en'], dataset_seed=42, dataset_test_ratio=0.01, val_dataset_sample=10, save_result=True, system='You are a helpful assistant.', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', custom_train_dataset_path=[], custom_val_dataset_path=[], quantization_bit=0, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=True, merge_device_map='auto', save_safetensors=True, overwrite_generation_config=True, verbose=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_model_len=None, vllm_enable_lora=False, vllm_max_lora_rank=16, vllm_lora_modules=[], show_dataset_sample=10, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, to_peft_format=False, quant_bits=0, quant_method='awq', quant_n_samples=256, quant_seqlen=2048, quant_device_map='cpu', push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, commit_message='update files')
[INFO:swift] Global seed set to 42
[INFO:swift] replace_if_exists: False
[INFO:swift] merged_lora_path: `/home/centos/xiaolv/太安模型微调/swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000-merged`
[INFO:swift] device_count: 2
[INFO:swift] Loading the model using model_dir: ./models/models/qwen/Qwen1___5-7B-Chat
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.07s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] generation_config: GenerationConfig {
  "do_sample": true,
  "eos_token_id": 151645,
  "max_new_tokens": 2048,
  "pad_token_id": 151643,
  "temperature": 0.3,
  "top_k": 20,
  "top_p": 0.7
}

[INFO:swift] SwiftModel: 7733.9075M Params (12.5829M Trainable [0.1627%]), 268.4375M Buffers.
[INFO:swift] system: You are a helpful assistant.
[INFO:swift] Merge LoRA...
[INFO:swift] Saving merged weights...
[INFO:swift] Successfully merged LoRA and saved in /home/centos/xiaolv/太安模型微调/swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000-merged.
[INFO:swift] Setting args.sft_type: 'full'
[INFO:swift] Setting args.ckpt_dir: /home/centos/xiaolv/太安模型微调/swift_qwen/output/qwen1half-7b-chat-swift/qwen1half-7b-chat/v1-20240416-160243/checkpoint-69000-merged
[INFO:swift] End time of running main: 2024-05-09 14:30:00.366159

@xiaolvtongxue-zt
Copy link
Author

@Jintao-Huang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants