support other visual grounding datasets? #112

PaulTHong · 2022-11-09T11:46:10Z

Hey, you conduct Visual Grounding experiment on RefCOCO+. Have you tried on other datasets such as RefCOCO or RefCOCOg？ If I am going to do this, how can I get the data? Since in your release, only json file of RefCOCO+ is provided. Are these json file generated by yourself? Or are they downloaded from somewhere else? (I just find the data form of your ALBEF VG is not the same as TransVG.) Thank you very much. Looking forward to your reply.

LiJunnan1992 · 2022-11-11T03:10:22Z

Our refcoco+ annotation is converted from the official annotation: https://github.com/lichengunc/refer.
We only use image-text pairs during training. During inference, we use the GradCam to rank the proposals provided by https://github.com/lichengunc/MAttNet.

PaulTHong · 2022-11-13T08:06:37Z

Thank you for your reply! I still have some questions about dataset to consult on you. I conduct VG experiments with dataset provided in work TransVG's guidance https://github.com/djiajunustc/TransVG/blob/main/docs/GETTING_STARTED.md, which is downloaded from https://drive.google.com/file/d/1fVwdDvXNbH8uuq_pHD_o5HI7yqeuz0yS/view. The downloaded data includes unc+_train.pth, corpus.pth etc., it seems similar to your converted refcoco+_train.json etc.

Since you only provide json file of RefCOCO+，now I want to experiment with RefCOCO etc, do I just need to convert the files as you did with RefCOCO+？ I can't open the data link in https://github.com/lichengunc/refer.

So my issue is could you please provide the code script of converting data, or matched version of RefCOCO. Thus I can try RefCOCO, RefCOCOg directly. Thank you very much! Wish I have stated my question clearly.

PaulTHong · 2022-12-10T12:39:57Z

Hello, could you help me to response the above question? Thank you very much!

LiJunnan1992 · 2022-12-13T00:16:25Z

Here is a code snippet I used for data conversion:

split = 'train'
ref_ids = refer.getRefIds(split=split)

annotations = []

dim_w, dim_h = 384, 384
patch_size = 32
n_patch_w, n_patch_h = dim_w//patch_size, dim_h//patch_size

refer.getRefIds()
for ref_id in ref_ids:
    
    ref = refer.Refs[ref_id]      
    image = refer.Imgs[ref['image_id']]
    
    width, height = image['width'], image['height']   
    w_step = width/n_patch_w
    h_step = height/n_patch_h   
    patch_area = height*width/(n_patch_w*n_patch_h)    
    
    mask = refer.getMask(ref)['mask']
    
    patch = []
    for i in range(n_patch_h):
        for j in range(n_patch_w):
            y0 = max(0,round(i*h_step))
            y1 = min(height, round((i+1)*h_step))
            x0 = max(0,round(j*w_step))
            x1 = min(width, round((j+1)*w_step))
            submask = mask[int(y0):int(y1),int(x0):int(x1)]
            patch.append(submask.sum()/patch_area)    
    
    text = [sentence['sent'] for sentence in ref['sentences']]
    imgPath = os.path.join('/export/share/datasets/vision/coco/images/train2014', image['file_name'])
    annotation = {'image': imgPath, 'text':text, 'patch':patch, 'type':'ref', 'ref_id':ref['ref_id']}
    annotations.append(annotation)

PaulTHong · 2022-12-13T11:38:06Z

Got it. Thank you very much! I will have a try!

PaulTHong · 2022-12-14T02:10:00Z

Hey, I have a small problem to bother you. I find four files in you data/refcoco+ subfoler: cocos.json, dets.json, instances.json, ref(unc).p. And in the provided file of codebase TransVG, I find there are two similar files in refcoco and refcoco+:instances.json, refs(unc).p, but the cocos.json, dets.json are missing. Where are the latter two files from? Or do refcoco and refcoco+ share the same file of cocos.json, dets.json. I make refcoco share the same file and finish the data transfer, training is OK, but at one step of eval, it raises "ref_id key error". Sorry to bother you again. Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support other visual grounding datasets? #112

support other visual grounding datasets? #112

PaulTHong commented Nov 9, 2022

LiJunnan1992 commented Nov 11, 2022

PaulTHong commented Nov 13, 2022

PaulTHong commented Dec 10, 2022

LiJunnan1992 commented Dec 13, 2022

PaulTHong commented Dec 13, 2022

PaulTHong commented Dec 14, 2022

support other visual grounding datasets? #112

support other visual grounding datasets? #112

Comments

PaulTHong commented Nov 9, 2022

LiJunnan1992 commented Nov 11, 2022

PaulTHong commented Nov 13, 2022

PaulTHong commented Dec 10, 2022

LiJunnan1992 commented Dec 13, 2022

PaulTHong commented Dec 13, 2022

PaulTHong commented Dec 14, 2022