Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction tuning with my own datasets #139

Open
sonderzhang opened this issue Mar 8, 2024 · 4 comments
Open

Instruction tuning with my own datasets #139

sonderzhang opened this issue Mar 8, 2024 · 4 comments

Comments

@sonderzhang
Copy link

I am planning to fine-tune the VideoChat2 model with custom instruction data to enhance its performance on downstream tasks. I have a couple of questions regarding the pre-training data and the process of fine-tuning with Chinese instructions. Your insights will be highly valuable to me.

1. Pre-Training Data Language:

Was Chinese video-text data utilized in the pre-training phase of the VideoChat2 model? I've experimented with some Chinese instructions, and the model's performance was quite satisfactory. Is it advisable to perform instruction tuning on the stage 3 model using Chinese instructions?
2.Multi-GPU Fine-Tuning:

I am interested in fine-tuning the model using multiple GPUs to expedite the training process. However, I couldn't find any related arguments or settings for enabling multi-GPU training in the provided configuration file ("/scripts/config_7b_stage3.py"). Could you provide guidance or examples on how to modify the configuration for multi-GPU support?

Your assistance will greatly aid in optimizing the model for my specific requirements. Thank you in advance for your help.

@Andy1621
Copy link
Collaborator

Andy1621 commented Mar 8, 2024

Thanks for your questions!

  1. For the Chinese QA, since we do not apply those LLMs work well for Chinese, it may be not good to directly apply it.
  2. For the multi-GPT, please check the run.sh. We use torchrun to execute it.

@sonderzhang
Copy link
Author

Thank you for your guidance! I've managed to fine-tune the model using multiple GPUs successfully. I suspect that the model's proficiency in Chinese might be attributed to the vicuna model components. Therefore, further fine-tuning this model with additional Chinese instructions could potentially enhance its performance. I'm considering exploring this to see the impact on its language handling capabilities.

@sonderzhang
Copy link
Author

May I inquire about the number of GPUs utilized during the fine-tuning process? Thank you!

@Andy1621
Copy link
Collaborator

Andy1621 commented Mar 8, 2024

For small fine-tuning data, I think 4-8 GPU > 40G is ok. However, the current codebase may be not efficient. You can follow some other repos like LAVIN to use some lightweight fine-tuning strategies like QLoRA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants