New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Integrate BrowserGym #1452
Conversation
Awesome thanks a bunch @frankxu2004 ! We'll take a look. |
An example interaction:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @frankxu2004 ! First a few high-level questions:
- how is
browser_server
run? Does the user need to run it before starting OpenDevin? I didn't see any place where the server is started. - and actually do we need to stand up a flask server or is there an alternative way to communicate with BrowserGym? The reason why I ask is because having a separate server seems to add some complexity, and it'd be nice if we didn't have to make things too complex.
Agreed--I'm worried about the extra overhead here to run a separate server. Also curious what's going on under the hood with |
Thanks for the comments!
Potential solution is to refactor BrowserGym using
Hope it helps! Right now I use a single-threaded Flask server just for this sync vs async incompatibility. |
Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/ |
Thanks for the suggestion. I tried applying this patch in the main backend but it seems like the asyncio server we used
From the package readme: Right now I am investigating replacing the separate Flask server with |
I take a look and it seems it is challenging to make async & sync live well with each other. Starting the browser in a separate thread/process and communicate with it via |
Updated the implementation cc @neubig @rbren @xingyaoww No flask server needed. A process is created when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is great efforts and i can see it could saves us tremendous time from implementing each web action & setting up evaluation for web browsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is exciting, thanks @frankxu2004 !
I'll open up follow-up issues for:
- Expanding our browser action space
- Implementing a
BrowserAgent
that only does browsing - Evaluate it against WebArena using BrowserGym to make sure it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along great!
opendevin/action/browse.py
Outdated
) | ||
|
||
text_content = html2text.html2text(flatten_dom_to_str(obs['dom_object'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to make this conversion optional--I could see some agents wanting the raw HTML. Also, it's not necessarily an HTML URL! Maybe we add a format
param which could be one of html
, text
, or screenshot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous impl. of this action is using the text content of the playwright page as content, so now I am keeping this conversion as default. I agree we should add configurable observation here, but up to the agent implementation. As a result, I modified the BrowserObservation towards a more complete one, having all kinds of observation types if the agent wants to use them.
…tegrate_browsergym
… handling of browseurlaction
Updated html and text handling. Default to using text as content for this PR, to maintain the same observation as previous versions |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1452 +/- ##
=======================================
Coverage ? 60.88%
=======================================
Files ? 85
Lines ? 3710
Branches ? 0
=======================================
Hits ? 2259
Misses ? 1451
Partials ? 0 ☔ View full report in Codecov by Sentry. |
It seems like it's failing on MacOS but passing on Linux, but interestingly failing at the sandbox connection part. Any ideas why this might be? cc @xingyaoww |
…tegrate_browsergym
…in into integrate_browsergym
…tegrate_browsergym
…tegrate_browsergym
…in into integrate_browsergym
…tegrate_browsergym
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Amazing work!
I've tested on both Linux and my local MacOS environment and can confirm it work as intended.
@@ -22,7 +22,8 @@ uvicorn = "*" | |||
types-toml = "*" | |||
numpy = "*" | |||
json-repair = "*" | |||
playwright = "*" | |||
browsergym = "*" # integrate browsergym as the browsing interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand you only need a dependency on browsergym-core
.
browsergym
is a meta-package that includes additional benchmarks with extra dependencies, browsergym-webarena
, browsergym-workarena
and browsergym-miniwob
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gasse Thanks for the comment! We are also interested in evaluating agents in the OpenDevin platform on web agent benchmarks such as WebArena, WorkArena and Miniwob++, so we might just leave it as-is for now!
Currently the browser is implemented with raw playwright. Ideally we can reuse existing work BrowserGym to enable more interactions with a browser other than just goto url. This first PR aims at replacing the current browser action with BrowserGym integration. In the future we can easily add more actions supported by BrowserGym to enable agent interaction with the web.
In the proposed setup, we need to start a new backend server
browser_api_server.py
as an interface for all browser-related interactions. In the future, all browser-related actions will just be HTTP API calls to this server.I would love suggestions from core maintainers on how to best handle this server's lifecycle with the main backend server.
Right now, to run it, run
poetry run python opendevin/browser_server/browser_api_server.py
along side with the main backend server. Maybe we could add this line to Makefile?cc @neubig @xingyaoww
Fixes #1384