Skip to content

Streamline the creation of supervised datasets to facilitate data augmentation for deep learning architectures focused on image captioning. The core framework leverages MiniGPT-4, complemented by the pre-trained Vicuna model, which boasts 13 billion parameters.

Notifications You must be signed in to change notification settings

neemiasbsilva/MiniGPT4-image-caption-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image captioning generation using MiniGPT-4 and Vicuna pre-trained model

PyTorch PythonAnywhere Shell Script

Description

This repository constitutes an implementation of an image captioner for large datasets, aiming to streamline the creation process of supervised datasets to aid in the data augmentation procedure for image captioning deep learning architectures.

The foundational framework utilized is the MiniGPT-4, supplemented by the pre-trained Vicuna model boasting 13 billion parameters.

Pre-requisite

You must have a GPU-enabled machine with a memory capacity of at least 23 GB.

Getting Started

Installation

git clone https://github.com/neemiasbsilva/MiniGPT-4-image-caption-implementation.git
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigptv
conda install pandas
mv MiniGPT-4/* ../.

Setup the shell script

In the shell file (run.sh) you have to specify:

  • data_path: the path where your image dataset are.
  • beam_search: hyperparameter that is a range 0 to 10;
  • temperature: hyperparameter (between 0.1 to 1.0);
  • save_path: local you have to save your caption data set.

Setup pre-trained models

  • Download the Vicuna 13 B

  • Set the LLM path minigpt4/configs/models/minigpt4_vicuna0.yaml in Line 15.

    llama_model: "vicuna"
    
  • Download the MiniGPT-4 Checkpoint Model

  • Set the LLM path eval_configs/minigpt4_eval.yaml in Line 8.

    ckpt: pretrained_minigpt4.pth
    

Usage

sh run.sh

About

Streamline the creation of supervised datasets to facilitate data augmentation for deep learning architectures focused on image captioning. The core framework leverages MiniGPT-4, complemented by the pre-trained Vicuna model, which boasts 13 billion parameters.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages