11000-Image-Video-caption-data-of-human-action
-
Updated
Apr 18, 2024
11000-Image-Video-caption-data-of-human-action
Character Recognition system using CNN and Streamlit
20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
Scan text from an image and convert into speech/audio of desired language.
10100-Image-caption-data-of-human-face
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
Windows version of text_extraction(VS2013). This code is the implementation of the method proposed in the paper “Multi-script text extraction from natural scenes” (Gomez & Karatzas) to appear in ICDAR2013 conference.
10000-Image-caption-data-of-gestures
10000-Image-caption-data-of-vehicles
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral
10000-Image-caption-data-of-diverse-scenes
Some Python scripts to load Vietnamese visual linguistic data
Image Captioning With MobileNet-LLaMA 3
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
lmmtoolkit is a toolkit for Multi-Modal Learning
Raster graphics package for Fōrmulæ, in JavaScript
FCLL: A Fine-grained Contrastive Language-Image Learning Model
The first public Vietnamese visual linguistic foundation model(s)
Add a description, image, and links to the image-text topic page so that developers can more easily learn about it.
To associate your repository with the image-text topic, visit your repo's landing page and select "manage topics."