Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About choosing dataset format and pre-training weights #306

Open
xmc-andy opened this issue Nov 13, 2023 · 2 comments
Open

About choosing dataset format and pre-training weights #306

xmc-andy opened this issue Nov 13, 2023 · 2 comments

Comments

@xmc-andy
Copy link

xmc-andy commented Nov 13, 2023

Hello, authors! I have a question about choosing a dataset format and corresponding weights. I am doing a classification task with multiple images and prompt input. If multiple images are regarded as videos, there are two options: SD format (single <image> + single <Users>, where <image> represents all images) and DC mode (single <image> + multiple <Users>) . I understand their difference lies in the use of prompt. DC mode is more suitable for each picture with detailed prompts, while SD mode is suitable for all pictures to use a unified prompt. Is my understanding correct?

In addition, I used the Image-MPT7B weight in SD mode before, but it seems that the Video-LLaMA7B-DenseCaption weight in DC/SD mode is more suitable for the video frame mode. Is my understanding correct?

@Luodian
Copy link
Owner

Luodian commented Nov 13, 2023

Yes, it's pretty correct! I suggest you use DC mode and use Video pretrained weights. You could see via our web demo, the backend model is Video-LLaMA7B-DC.

Remember to put the multiple images as frames in the [B, T, F, C, H, W]'s F dimension (debug at vision_x to see the actual dimension during your training)
And I will suggest you to try both template:

1. <image> + prompt
2. <image><image>...<image> + prompt

For training DC, we use the first.

@xmc-andy
Copy link
Author

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants