Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to evaluate the image difference description? #283

Open
2 tasks done
ElegantLin opened this issue Oct 8, 2023 · 4 comments
Open
2 tasks done

How to evaluate the image difference description? #283

ElegantLin opened this issue Oct 8, 2023 · 4 comments

Comments

@ElegantLin
Copy link

ElegantLin commented Oct 8, 2023

Hi authors,

Thanks for your great repo here. I have checked the eval folder and I wonder whether you have a specific dataset for image difference description since you have two training set for this.

The tag should be evaluation.

Thanks!

Before you open an issue, please check if a similar issue already exists or has been closed before.

When you open an issue, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description
  • [] Assign an issue type tag (label):
    • dataset (mimic-it download, usage, etc.),
    • demo (online demo), doc (readme, wiki, paper, video etc.),
    • evaluation (evaluation result, performance of Otter etc.),
    • model (model configuration, components, etc.),
    • train (training configuration, process, code, etc.)

Thank you for your contributions!

@Luodian
Copy link
Owner

Luodian commented Oct 8, 2023

oh that depends on how you set it in training, if you choose load the SD and GSD 's two images as in-context examples. The prompt should be <image><image>User: What's the difference of these two images? GPT:<answer> xxxxxx.
The vision tensor should be [1, 2, 1, 3, 224, 224] where 2 is in-context dimension.

Then in evaluation, you should do in same way, that works in our experimentation. However, we do not have a released Image model trained on SD and GSD since we find it deteriorates benchmark performance lol.

Another way is to load it as a 2-frames video, then in prompt you should put:
<image>User: xxx.

@ElegantLin
Copy link
Author

Thanks for your quick response. I will close it after I tried it.

Thanks!

@ElegantLin
Copy link
Author

BTW, I understand that the image difference description dataset will greatly hurt the benchmark's performance. Is my understanding correct?

If my understanding is correct, what do you mean by the performance of benchmarks, like image captioning, VQA tasks?

Thanks!

@Luodian
Copy link
Owner

Luodian commented Oct 8, 2023

It will not greatly hurt if using SD, GSD to jointly training with in general image-text instruction tuning datasets.

The performance here includes COCO Caption, MMBench... Sorry I can not reveal too much since we will have a code/paper release recently lol, we will also propose ways to remedy such deterioration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants