Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple question: What are the public datasets included in InternVid-200M? #100

Open
jong980812 opened this issue Apr 15, 2024 · 1 comment

Comments

@jong980812
Copy link

In "InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation," I would like to use ViCLIP-B-16 on InternVid-200M. Does this dataset ( or InternVid-FLT) contain videos from Kinetics400, SSV2, and UCF101? It is not clearly written in your paper whether only the labels were referred to, or if the videos were also included. I am curious to know

@shepnerd
Copy link
Member

It does not contain videos from your mentioned datasets. We clearified it in Sec. 3.1 data curation as follows:"We ensure the uniqueness of our dataset by creating a database of YouTube video IDs and excluding any videos already present in publicly available datasets (released prior to April 2023)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants