Before coming to the download, please modify the cache_root
in lavis/configs/defaults.yaml.
The cache_root
is the repo's default data dir.
We use the BLIP-2's 5 tasks (COCO, NoCaps, VQAv2, GQA, OK-VQA) for evaluation. COCO, VQAv2 and OK-VQA share the same COCO images. To download, just run the scrips under prepare_data:
python download_coco.py # COCO images
python download_nocaps.py # nocaps images
python download_gqa.py # gpa images
The annotation files will be automatically downloaded, when you run the training or evaluation code.
We have already discussed how to download COCO. To download VG and SBU (around 26G disk space), please also run:
python download_vg.py
python download_sbu.py
The annotation files will be automatically downloaded, when you run the training or evaluation code.
Note that it might be difficult for some researchers to download the SBU from urls. We provide an alternative sbu.tar.gz with around 0.9M data in a single compressed file. There are also alternative links for VG (part1, part2). If you use alternative links, please do not forget to put the images at your_cache_root/sbu_captions/images and your_cache_root/vg/images/.
Laion-COCO is a dataset that simulating the BLIP's synthetic image-caption pairs.
For Laion-COCO, we save them as several .tar
files rather than single .jpg
files, to avoid too much small files.
The tar files can be read using webdataset, which is a library designed for pytorch based large-scale data loading.
I highly recommend the combination of webdataset (load data) and img2dataset (download data) for pytorch based image-text pre-training.
To download the images and annotations, please run the following scripts:
python download_laion_coco/step1_download_laion_coco_meta.py # download the meta info of the images
python download_laion_coco/step2_download_laion_coco.sh # use img2dataest to download images and annotations into .tar based on the meta infos
Please remember to modify the storage
in lavis/configs/datasets/laion/defaults_coco.yaml line 13.
We give the example about what we used in our training: /home/zhangao/data/laion/{00000..00200}.tar
.
The {00000..00200}.tar
means using the tar files from 00000.tar
to 00200.tar
.
MiniGPT-4 create a very small set of self-instruct dataset for aligning VL-LLM with conversational scenarios. To download, please download our preprocessed data.
Please remember to modify the storage
in lavis/configs/defaults.yaml line 6 to your downloaded cc_sbu_align
dir.