What’s the format of "train_data.csv" and "validation_data.csv" when I finetune on custom dataset #792

ShuxunoO · 2023-11-28T09:55:59Z

ShuxunoO
Nov 28, 2023

Hello, I have prepared a local dataset，including img-caption pairs just like:

dataset_folder:

    --00000000.png
    --00000000.txt
    --00000001.png
    --00000002.txt

Now I want to finetune a released pretrained CLIP model （such as ViT-B/32）, in what form should I organize the csv file, can you give me an example? Or is there any ready-made script for generating csv files?

Besides, which script can I use to finetune the model? Can you give me a reference link or a tutorial?

Thanks a lot!

gabrielilharco · 2023-11-28T16:49:34Z

gabrielilharco
Nov 28, 2023
Maintainer

Hey @ShuxunoO. You have two options here.

One is to put those samples in shards by compressing n pairs of files into a .tar file (e.g. the first 1000 pairs of files go to shard_0000.tar, etc). Then you can use --train-data "/yourdir/shard-{0000..1234}.tar.

Your other option is to create a csv file with two columns, a filepath with the paths to the images and a title with the corresponding text captions. If you want, you can also have different column names, but be sure to set --csv-img-key and --csv-caption-key appropriately if you do.

For fine-tuning, you can run the sample training commands from our readme, but be sure to set the --pretrained flag.

0 replies

ShuxunoO · 2023-11-29T02:26:12Z

ShuxunoO
Nov 29, 2023
Author

demo like this:

         file_path ,                                          title 
        "NFT1000/Angry Ape Army/img/Angry Ape Army_1910.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_1910.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_3085.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_3085.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_279.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_279.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_26.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_26.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_2708.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_2708.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_2088.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_2088.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_2371.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_2371.txt"
        "NFT1000/Angry Ape Army/img/Angry Ape Army_634.png","NFT1000/Angry Ape Army/caption/Angry Ape Army_634.txt"

is it all right?

python -m training.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --train-data="/path/to/train_data.csv"  \
    --val-data="/path/to/validation_data.csv"  \
    --csv-img-key filepath \
    --csv-caption-key title \
    --warmup 10000 \
    --batch-size=128 \
    --lr=1e-3 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --model RN50

should I replace the --model RN50 to a local path of a released pretrained model?

0 replies

gabrielilharco · 2023-11-29T02:49:57Z

gabrielilharco
Nov 29, 2023
Maintainer

If you're going with csvs, you should have the actual captions in the second column, not a pointer to the files. If you're using a pre-trained model we support, you can use a string for the --pretrained flag (e.g. --model RN50 --pretrained openai). Otherwise, you can point to a local checkpoint file (e.g. --model RN50 --pretrained /path/to/your/ckpt.pt)

1 reply

adsbansal Jan 22, 2024

Hi, thank you for maintaining this amazing repository!
In the does the above script unfreeze all the layers in both the image and text encoder? If not which layers are tuned using the above script?

ShuxunoO · 2023-11-29T03:03:41Z

ShuxunoO
Nov 29, 2023
Author

     file_path ,                                          title 
     "NFT1000/Angry Ape Army/img/Angry Ape Army_1910.png","A picture of Angry Ape Army, containing bg 27 Background acid green orange bust Bust mad max armor BustCover head striped white seaweed Head samurai soldier helmet Helmet."
    "NFT1000/Angry Ape Army/img/Angry Ape Army_3085.png","A picture of Angry Ape Army, containing bg 36 Background original bust Bust sci fi soldier orange BustCover head blue vapor Head barbarian horn helmet Helmet."
    "NFT1000/Angry Ape Army/img/Angry Ape Army_279.png": "A picture of Angry Ape Army, containing bg 63 Background original bust Bust cyborg white BustCover head blue bleached Head AAA headband blue BaseDetail medival soldier helmet Helmet."
    "NFT1000/Angry Ape Army/img/Angry Ape Army_26.png","A picture of Angry Ape Army, containing bg 47 Background original bust Bust bullet vest camo BustCover head 3 orange Head."
    
     ……

and my args is like following:

python -m training.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --train-data="/path/to/train_data.csv"  \
    --val-data="/path/to/validation_data.csv"  \
    --csv-img-key filepath \
    --csv-caption-key title \
    --warmup 10000 \
    --batch-size=128 \
    --lr=1e-3 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --model Vit-B-32
    --pretrained /path/to/my/ckpt.pt

I am going to have a try!
If I encounter any problems, I will continue to ask you for advice. Thank you very much!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What’s the format of "train_data.csv" and "validation_data.csv" when I finetune on custom dataset #792

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What’s the format of "train_data.csv" and "validation_data.csv" when I finetune on custom dataset #792

ShuxunoO Nov 28, 2023

Replies: 4 comments · 1 reply

gabrielilharco Nov 28, 2023 Maintainer

ShuxunoO Nov 29, 2023 Author

gabrielilharco Nov 29, 2023 Maintainer

adsbansal Jan 22, 2024

ShuxunoO Nov 29, 2023 Author

ShuxunoO
Nov 28, 2023

Replies: 4 comments 1 reply

gabrielilharco
Nov 28, 2023
Maintainer

ShuxunoO
Nov 29, 2023
Author

gabrielilharco
Nov 29, 2023
Maintainer

ShuxunoO
Nov 29, 2023
Author