This Streamlit app uses a pre-trained Vision Encoder-Decoder model based on ViT-GPT2 for image captioning. Users can upload an image, and the app will generate a caption for the image.
-
Clone the repository:
git clone https://github.com/your-username/your-image-captioning-app.git cd image_captioning
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Open your browser and go to http://localhost:8501 to access the app.
- Upload an image using the file uploader.
- Click the "Generate Caption" button.
- View the generated caption for the uploaded image.
- torch
- transformers
- streamlit
- Pillow
- The image captioning model is based on the Hugging Face Transformers library.
- Streamlit is used for building the user interface.
This project is licensed under the MIT License - see the LICENSE file for details.