Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 4.16 KB

model-zoo.md

File metadata and controls

36 lines (24 loc) · 4.16 KB

Model Zoo

Models are also available at Open in OpenXLab

CLIP

These model trained with original CLIP text-encoder from orignal CLIP codebase from OpenAI. The text-encoder stays fixed during the whole training process.

test metric is classification accuracy on Imagenet-S dataset. () is improved acc compared to original CLIP from openai.

train on grit-1m

model Acc1 Acc5 google drive link openxlab link
CLIP-B/16 68.31(+1.83) 90.31(+1.41) clip_b16_grit1m_fultune_8xe clip_b16_grit1m_fultune_8xe
CLIP-L/14 77.22(+3.74) 94.38(+2.78) clip_l14_grit1m_fultune_8xe clip_l14_grit1m_fultune_8xe
CLIP-L/14@336 78.15(+3.86) 94.86(+2.89) clip_l14@336_grit1m_fultune_8xe clip_l14@336_grit1m_fultune_8xe

train on grit-20m

model Acc1 Acc5 google drive link openxlab link
CLIP-B/16 68.89(+2.41) 90.51(+1.61) clip_b16_grit20m_fultune_2xe clip_b16_grit20m_fultune_2xe
CLIP-L/14 77.41(+3.93) 94.45(+2.82) clip_l14_grit20m_fultune_2xe clip_l14_grit20m_fultune_2xe
CLIP-L/14@336 79.61(+5.32) 95.31(+3.34) clip_l14@336_grit20m_fultune_4xe clip_l14_336_grit20m_fultune_4xe

train on combined dataset(mimagenet_top+grit-1m)

model Imagenet-S Acc1 Imagenet-S Acc5 COCO crop Acc1 google drive link openxlab link
CLIP-B/16 69.40(+2.92) 90.74(+1.84) 55.39(+4.97) clip_b16_grit1m+mim_fultune_4xe clip_b16_grit1m+mim_fultune_4xe
CLIP-L/14 77.80(+4.32) 94.46(+2.86) 58.83(+3.40) clip_l14_grit1m+mim_fultune_6xe clip_l14_grit1m+mim_fultune_6xe

Open-CLIP

We are planning to train based on open_clip with deeper ViT encoder. please STAY TUNED!