New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about zero-shot setting on Video-Text Retrieval #89
Comments
it seems that when setting “merge=True”,the results are better than the paper presented? |
Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL. |
I test the performance on activityNet,and obtain better results on “merge=True” with DSL,but obtain worse results on “merge=True” without DSL(worse than paper presented). The author replied to another people that they use DSL results. I also confuse about which setting they use ~~ |
Hi, were u able to resolve the confusion? |
Thank you for your in interesting work and your shared code!
I'm very confused that whether the zero-shot performance on MSRVTT reported in here requires setting “--mergeclip=True”?
Below is the result I reproduced:
“--mergeclip=True”:
“--mergeclip=False”:
AS the provided file defaults to "--mergeclip=True", I wonder if there is something wrong with this.
The text was updated successfully, but these errors were encountered: