This Python script utilizes LM Studio to create a conversational agent that describes images. The agent is designed to take a screenshot of a webpage provided by the user and then describe the contents of the screenshot in natural language.
Before running the script, ensure you have the following installed:
- Python 3.x (3.10 in my case)
- Required Python packages (install via
pip install -r requirements.txt
):playwright
asyncio
argparse
- Clone this repository to your local machine.
- Install the required dependencies.
- Obtain an OpenAI API key and configure it in
config.json
. - Run the script with the desired URL as an argument:
You can download LM Studio from here or directly for your platform:
Windows: Download
Linux: Download
MacOS: Download
conda create -n autogen-webagent python=3.10
conda activate autogen-webagent
pip install -r requirements.txt
python app.py
This is a small project of mine. Anyone who can help and has suggestions for improvement is welcome to participate. I would also be happy about a star if you like it.
Thanks :)