Instruction tuning with my own datasets #139

sonderzhang · 2024-03-08T08:25:14Z

I am planning to fine-tune the VideoChat2 model with custom instruction data to enhance its performance on downstream tasks. I have a couple of questions regarding the pre-training data and the process of fine-tuning with Chinese instructions. Your insights will be highly valuable to me.

1. Pre-Training Data Language:

Was Chinese video-text data utilized in the pre-training phase of the VideoChat2 model? I've experimented with some Chinese instructions, and the model's performance was quite satisfactory. Is it advisable to perform instruction tuning on the stage 3 model using Chinese instructions?
2.Multi-GPU Fine-Tuning:

I am interested in fine-tuning the model using multiple GPUs to expedite the training process. However, I couldn't find any related arguments or settings for enabling multi-GPU training in the provided configuration file ("/scripts/config_7b_stage3.py"). Could you provide guidance or examples on how to modify the configuration for multi-GPU support?

Your assistance will greatly aid in optimizing the model for my specific requirements. Thank you in advance for your help.

Andy1621 · 2024-03-08T09:23:29Z

Thanks for your questions!

For the Chinese QA, since we do not apply those LLMs work well for Chinese, it may be not good to directly apply it.
For the multi-GPT, please check the run.sh. We use torchrun to execute it.

sonderzhang · 2024-03-08T10:08:24Z

Thank you for your guidance! I've managed to fine-tune the model using multiple GPUs successfully. I suspect that the model's proficiency in Chinese might be attributed to the vicuna model components. Therefore, further fine-tuning this model with additional Chinese instructions could potentially enhance its performance. I'm considering exploring this to see the impact on its language handling capabilities.

sonderzhang · 2024-03-08T10:12:35Z

May I inquire about the number of GPUs utilized during the fine-tuning process? Thank you!

Andy1621 · 2024-03-08T13:42:12Z

For small fine-tuning data, I think 4-8 GPU > 40G is ok. However, the current codebase may be not efficient. You can follow some other repos like LAVIN to use some lightweight fine-tuning strategies like QLoRA.

sonderzhang closed this as completed Mar 8, 2024

sonderzhang reopened this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instruction tuning with my own datasets #139

Instruction tuning with my own datasets #139

sonderzhang commented Mar 8, 2024

Andy1621 commented Mar 8, 2024

sonderzhang commented Mar 8, 2024

sonderzhang commented Mar 8, 2024

Andy1621 commented Mar 8, 2024

Instruction tuning with my own datasets #139

Instruction tuning with my own datasets #139

Comments

sonderzhang commented Mar 8, 2024

Andy1621 commented Mar 8, 2024

sonderzhang commented Mar 8, 2024

sonderzhang commented Mar 8, 2024

Andy1621 commented Mar 8, 2024