You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to reproduce the results for VinVL+SCST on NoCaps, but my result was off by a visible margin.
I generate coco features by od model with pre-trained models with vinvl_vg_x152c4.pth, when I simply check the total types of generated tags train.label.tsv, it has 1319 types. After that, I limit the total types to open images 500 labels, my train.label.tsv has a total of 400~ types and my result was also off by a visible margin. I can only get 12.1 SPICE scores after CC(Cross entropy)+CIDER OPTIM. The config file is val_vinvl_x152c4.yaml
For the given coco_caption training set, train.label.tsv has a total of 498 types. I can normally reproduce your result 12.8 SPICE.
May I ask, which model should I use to reproduce your od feature results, could you plz you give me a link to this od model (To reproduce your feature results and similarly generate them)?
If it's not available, Or I should try an od model on the visual genome and open images by myself?
Many thanks !
(´ー∀ー)(´ー∀ー)(´ー∀ー`)
The text was updated successfully, but these errors were encountered:
@309018451 For the image tags, the image tags used in both NoCaps training and testing are generated from an OD model pretrained on OpenImages dataset, not from the model vinvl_vg_x152c4.pth.
We cannot release the OD model pretrained on OpenImages dataset, but you should be able to train your own OpenImages OD model or use some public OpenImages OD model.
I tried to reproduce the results for VinVL+SCST on NoCaps, but my result was off by a visible margin.
I generate coco features by od model with pre-trained models with vinvl_vg_x152c4.pth, when I simply check the total types of generated tags train.label.tsv, it has 1319 types. After that, I limit the total types to open images 500 labels, my train.label.tsv has a total of 400~ types and my result was also off by a visible margin. I can only get 12.1 SPICE scores after CC(Cross entropy)+CIDER OPTIM. The config file is val_vinvl_x152c4.yaml
For the given coco_caption training set, train.label.tsv has a total of 498 types. I can normally reproduce your result 12.8 SPICE.
May I ask, which model should I use to reproduce your od feature results, could you plz you give me a link to this od model (To reproduce your feature results and similarly generate them)?
If it's not available, Or I should try an od model on the visual genome and open images by myself?
Many thanks !
(´ー∀ー
)(´ー∀ー
)(´ー∀ー`)The text was updated successfully, but these errors were encountered: