image_caption

Captioning images using a CNN and Transformers. This repo is an attempt to implement captioning models from the paper titled "VirTex: Learning Visual Representations from Textual Annotations" (https://arxiv.org/abs/2006.06666) by Desai et.al. The paper uses image captioning as a pretraining task for other downstream tasks, but this repo is centered around the task of generating dense captions for images. Everything used in repo is written from scratch except the visual backbones (Pytorch pretrained models are used). Code for transformers is heavily inspired and even copied at places from the brilliant blog post "The Annotated Transformer" (https://nlp.seas.harvard.edu/2018/04/03/attention.html). The other parts of code are inspired and taken in some places from notebooks for fast.ai course - Part2 (https://www.fast.ai/)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
images		images
.gitattributes		.gitattributes
README.md		README.md
callbacks.py		callbacks.py
caption.ipynb		caption.ipynb
data.pkl		data.pkl
data_block.py		data_block.py
learner.py		learner.py
main_run.ipynb		main_run.ipynb
model.py		model.py
optimizers.py		optimizers.py
training.py		training.py
transformer.py		transformer.py
transforms.py		transforms.py
utils.py		utils.py
xresnets.py		xresnets.py

Abhimanyu08/image_caption

Folders and files

Latest commit

History

Repository files navigation

image_caption

Some results on images from validation dataset:

About

Topics

Resources

Stars

Watchers

Forks

Languages