lmm-graph-tree-vqa

How well do the GPT-4V and Gemini Pro Vision models perform zero-shot Visual Question Answering (VQA) on Data Structures?

We create a standard, repeatable process for selecting and obtaining VQA tasks in accordance with the Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science by The Joint Task Force on Computing Curricula Association for Computing Machinery (ACM) IEEE Computer Society.

Setup

The following instructions use a bash terminal and assume you have Python and Git installed on your machine.

Clone the repository

git clone https://github.com/gutbash/lmm-graph-tree-vqa.git
cd lmm-graph-tree-vqa

Create a virtual environment
```
python -m venv .venv
```

Activate the virtual environment

Linux and macOS:

source .venv/bin/activate

Windows:

source .venv/Scripts/activate

Install the dependencies
```
pip install -r requirements.txt
```
Set Environment Variables
```
mv .env.example .env
```
Edit the .env file and set the environment variables.

Quickstart

Structures

At the core of the project are the data structures. These are the base structures that are used to generate images for the VQA tasks.

There are four base classes: BinaryTree, BinarySearchTree, DirectedGraph, UndirectedGraph.

You can generate an individual image directly from these classes, but it is not the conventional approach.

The following example generates an image of a binary tree:

from generation.structures.tree import BinaryTree

structure = BinaryTree()

structure.generate()
structure.fill()
structure.draw(save=True, path='test.png')

Generators

Generate an individual image.

The conventional way to generate an individual image is to use the Generator.

The following example does the same as the previous example:

from generation.structures.tree import BinaryTree
from generation.generator import Generator
from pathlib import Path
import asyncio

generator = Generator()

async def run_generation():
    generated = await generator.generate_structure(structure_class=BinaryTree)
    filled = await generator.fill_structure(structure_instance=generated)
    await generator.draw_structure(structure_instance=filled, save=True, save_path=Path('test/'), save_name='test.png')

asyncio.run(run_generation())

Batch Generators

Generate a batch of images.

Use the BatchGenerator to create a batch of images. This will also link text and image prompts into the yaml data.

The following example generates a batch of binary trees:

from generation.structures.tree import BinaryTree
from generation.generator import BatchGenerator
from pathlib import Path
import asyncio

batch_generator = BatchGenerator()

async def run_batch():
    await batch_generator.generate_batch(
    structure_class=BinaryTree,
    type='bit',
    yaml_name='binary_tree.yaml',
    yaml_path=Path('data/'),
    save_path=Path('images/binary_tree/'),
    text_path=Path('text/'),
    text_name='binary_tree_text.yaml',
)
    
asyncio.run(run_batch())

Messages

Build a template for a prompt for a model with a list of messages.

OpenAI can use the following message types for prompts: SystemMessage, UserMessage, and AssistantMessage.

The following example creates a typical prompt for OpenAI:

from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage
from pathlib import Path

openai_messages = [
    UserMessage(content="Answer this question: What is in this image?", images=[Path('test/test.png')]),
]

DeepMind can use the following message types for prompts: ImageMessage and BaseMessage.

The following example creates a typical prompt for DeepMind:

from evaluation.models.messages.message import ImageMessage, BaseMessage
from pathlib import Path

deepmind_messages = [
    BaseMessage(content="Answer this question: What is in this image?"),
    ImageMessage(image=Path('test/test.png')),
]

Message Keys

Insert text/image prompts from the yaml data into messages.

Keys are replaced with the yaml data's text and image prompts during evaluation. Within a message, there are two keys that can be used within a string of a message's content or image:

{{content}} for the text prompt
{{image}} for the image prompt

The following example shows the same message lists as the previous examples using message keys:

from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage, ImageMessage, BaseMessage

openai_messages = [
    UserMessage(content="Answer this question: {{content}}", images=["{{image}}"]),
]

deepmind_messages = [
    BaseMessage(content="Answer this question: {{content}}"),
    ImageMessage(image="{{image}}"),
]

Models

Create instances of models for evaluation.

There are two models that can be created for evaluation: OpenAI and DeepMind.

The following example creates instances of both models:

from evaluation.models.openai import OpenAI
from evaluation.models.deepmind import DeepMind
from dotenv import load_dotenv
import asyncio
import os

load_dotenv()
openai_api_key = os.environ.get('OPENAI_API_KEY_DEV')
deepmind_api_key = os.environ.get('DEEPMIND_API_KEY_DEV')

openai = OpenAI(api_key=openai_api_key)
deepmind = DeepMind(api_key=deepmind_api_key)

You can directly run completions from these models given a list of messages:

from evaluation.models.messages.message import UserMessage, SystemMessage, ImageMessage, BaseMessage

openai_messages = [UserMessage(content="{{content}}", images=["{{image}}"])]
deepmind_messages = [BaseMessage(content="{{content}}"), ImageMessage(image="{{image}}")]

async def run_completions():

    await openai.arun(messages=openai_messages)
    await deepmind.arun(messages=deepmind_messages)
    
asyncio.run(run_completions())

Evaluation

Evaluate models on prompts once images are batch generated and automatically linked to the yaml data with the Evaluator.

The following example evaluates the OpenAI model on a batch of binary trees:

from evaluation.evaluator import Evaluator
from evaluation.models.openai import OpenAI
from evaluation.models.messages.message import UserMessage, SystemMessage, AssistantMessage
from pathlib import Path
from dotenv import load_dotenv
import asyncio
import os

load_dotenv()
openai_api_key = os.environ.get('OPENAI_API_KEY_DEV')

openai = OpenAI(
    api_key=openai_api_key,
)

messages = [UserMessage(content="{{content}}", images=["{{image}}"])]

evaluator = Evaluator()

async def run_evaluation():

    await evaluator.evaluate(
        model=openai,
        messages=messages,
        yaml_path=Path('data/'),
        yaml_name='binary_tree.yaml',
        csv_path=Path('results/'),
        csv_name='openai.csv',
        repeats=3,
    )
    
asyncio.run(run_evaluation())

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
data		data
evaluation		evaluation
generation		generation
images		images
plot		plot
prediction		prediction
results/archive/large-macro		results/archive/large-macro
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE.MD		LICENSE.MD
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
run_alt.py		run_alt.py
run_modular.py		run_modular.py

License

gutbash/lmm-graph-vision

Folders and files

Latest commit

History

Repository files navigation

lmm-graph-tree-vqa

Setup

Quickstart

Structures

Generators

Batch Generators

Messages

Message Keys

Models

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages