Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Integrate BrowserGym #1452

Merged
merged 35 commits into from May 2, 2024
Merged

Conversation

frankxu2004
Copy link
Collaborator

@frankxu2004 frankxu2004 commented Apr 29, 2024

Currently the browser is implemented with raw playwright. Ideally we can reuse existing work BrowserGym to enable more interactions with a browser other than just goto url. This first PR aims at replacing the current browser action with BrowserGym integration. In the future we can easily add more actions supported by BrowserGym to enable agent interaction with the web.

In the proposed setup, we need to start a new backend server browser_api_server.py as an interface for all browser-related interactions. In the future, all browser-related actions will just be HTTP API calls to this server.

I would love suggestions from core maintainers on how to best handle this server's lifecycle with the main backend server.

Right now, to run it, run poetry run python opendevin/browser_server/browser_api_server.py along side with the main backend server. Maybe we could add this line to Makefile?

cc @neubig @xingyaoww

Fixes #1384

@neubig neubig requested review from neubig and xingyaoww April 29, 2024 15:24
@neubig
Copy link
Contributor

neubig commented Apr 29, 2024

Awesome thanks a bunch @frankxu2004 ! We'll take a look.

@frankxu2004 frankxu2004 marked this pull request as draft April 29, 2024 15:31
@frankxu2004
Copy link
Collaborator Author

An example interaction:

==============
STEP 0

12:01:15 - PLAN
browse www.google.com
12:01:17 - ACTION
BrowseURLAction(url='www.google.com', action='browse')
12:01:18 - OBSERVATION
Error: Protocol error (Page.navigate): Cannot navigate to invalid URL
=========================== logs ===========================
navigating to "/projects/ogma3/fangzhex/OpenDevinwww.google.com", waiting until "load"
============================================================


==============
STEP 1

12:01:18 - PLAN
browse www.google.com
12:01:20 - ACTION
AgentThinkAction(thought='I need to double check the URL format before browsing again.', action='think')


==============
STEP 2

12:01:20 - PLAN
browse www.google.com
12:01:21 - ACTION
BrowseURLAction(url='https://www.google.com', action='browse')
12:01:24 - OBSERVATION
[ About ](https://about.google/?fg=1&utm_source=google-
US&utm_medium=referral&utm_campaign=hp-header) [ Store
](https://store.google.com/US?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-
US)

[ Gmail ](https://mail.google.com/mail/&ogbl)

[ Images ](https://www.google.com/imghp?hl=en&ogbl)

[ ](https://www.google.com/intl/en/about/products)

[ Sign in
](https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/&ec=GAZAmgQ)

![Google](/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png)

Choose what you’re giving feedback on

* * * *

See more

Delete

* * Delete 

* Report inappropriate predictions 

I'm Feeling Curious

I'm Feeling Hungry

I'm Feeling Adventurous

I'm Feeling Playful

I'm Feeling Stellar

I'm Feeling Doodley

I'm Feeling Trendy

I'm Feeling Artistic

I'm Feeling Funny

[ Advertising ](https://www.google.com/intl/en_us/ads/?subid=ww-ww-et-g-awa-a-
g_hpafoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpafooter&fg=1)
[ Business ](https://www.google.com/services/?subid=ww-ww-et-g-awa-a-
g_hpbfoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpbfooter&fg=1)
[ How Search works ](https://google.com/search/howsearchworks/?fg=1)

[
![]()
Our third decade of climate action: join us
](https://sustainability.google/?utm_source=googlehpfooter&utm_medium=housepromos&utm_campaign=bottom-
footer&utm_content=)

[ Privacy ](https://policies.google.com/privacy?hl=en&fg=1) [ Terms
](https://policies.google.com/terms?hl=en&fg=1)

Settings

[ Search settings ](https://www.google.com/preferences?hl=en&fg=1)

[ Advanced search ](/advanced_search?hl=en&fg=1)

[ Your data in Search
](/history/privacyadvisor/search/unauth?utm_source=googlemenu&fg=1&cctld=com)

[ Search history ](/history/optout?hl=en&fg=1)

[ Search help
](https://support.google.com/websearch/?p=ws_results_help&hl=en&fg=1)

Send feedback

Dark theme: Off

Google apps




==============
STEP 3

12:01:24 - PLAN
browse www.google.com
12:01:27 - ACTION
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:226
INFO
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:107 - Task finished by agent

@frankxu2004 frankxu2004 marked this pull request as ready for review April 29, 2024 16:15
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frankxu2004 ! First a few high-level questions:

  1. how is browser_server run? Does the user need to run it before starting OpenDevin? I didn't see any place where the server is started.
  2. and actually do we need to stand up a flask server or is there an alternative way to communicate with BrowserGym? The reason why I ask is because having a separate server seems to add some complexity, and it'd be nice if we didn't have to make things too complex.

@rbren
Copy link
Collaborator

rbren commented Apr 29, 2024

Agreed--I'm worried about the extra overhead here to run a separate server. Also curious what's going on under the hood with gymnasium--is it running a docker container?

@frankxu2004
Copy link
Collaborator Author

Thanks for the comments!

@neubig

  1. Right now, the server would be run along side with the main backend on host system by poetry run python opendevin/browser_server/browser_api_server.py. It can be moved inside sandbox docker for better security but we need to figure out how to communicate between main backend and the sandboxed server.
  2. The main issue with this is that, in BrowserGym, Playwright is used with sync API, and as a result cannot be used inside an asyncio loop. The current main backend fastapi server is based on asyncio so putting browsergym instantiation code in the backend will result in playwright._impl._api_types.Error: It looks like you are using Playwright Sync API inside the asyncio loop.

Potential solution is to refactor BrowserGym using async playwright API, but as BrowserGym is quite complex this route might be requiring a lot of work. Another option is to explore ways of having a sync thread for BrowserGym, but I might need a bit more help.

@rbren

  1. going on under the hood with gymnasium--is it running a docker container?
    Underlying code for BrowserGym is just creating a Playwright sync_api context with Chromium, and maintain that context and expose high-level APIs for both useful observations for agents to ingest, and provide an execution platform where generated actions could execute on.

Hope it helps! Right now I use a single-threaded Flask server just for this sync vs async incompatibility.

@neubig
Copy link
Contributor

neubig commented Apr 30, 2024

Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/

@frankxu2004
Copy link
Collaborator Author

frankxu2004 commented Apr 30, 2024

Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/

Thanks for the suggestion. I tried applying this patch in the main backend but it seems like the asyncio server we used uvicorn cannot be patched

ValueError: Can't patch loop of type <class 'uvloop.Loop'>

From the package readme: Only event loops from asyncio can be patched; Loops from other projects, such as uvloop or quamash, generally can’t be patched.

Right now I am investigating replacing the separate Flask server with multiprocessing.Process and multiprocessing.Queue, hopefully eliminating the need for HTTP server. This way we can more easily handle the BrowserEnv's life cycle with AgentController

@xingyaoww
Copy link
Collaborator

xingyaoww commented Apr 30, 2024

I take a look and it seems it is challenging to make async & sync live well with each other. Starting the browser in a separate thread/process and communicate with it via Queue could work so that we don't need to start this as a separate server. Look forward to the implementation though - pls let me know if you need any help!!

@frankxu2004
Copy link
Collaborator Author

Updated the implementation cc @neubig @rbren @xingyaoww

No flask server needed. A process is created when AgentController is initialized. Question: how to handle graceful shutdown of the Process created? I am not super familiar with the codebase, but it seems that atexit is not used for AgentController.

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is great efforts and i can see it could saves us tremendous time from implementing each web action & setting up evaluation for web browsing.

opendevin/action/browse.py Outdated Show resolved Hide resolved
opendevin/action/browse.py Show resolved Hide resolved
opendevin/browser_server/browser_process.py Outdated Show resolved Hide resolved
opendevin/browser_server/browser_process.py Outdated Show resolved Hide resolved
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is exciting, thanks @frankxu2004 !

I'll open up follow-up issues for:

  1. Expanding our browser action space
  2. Implementing a BrowserAgent that only does browsing
  3. Evaluate it against WebArena using BrowserGym to make sure it works

opendevin/action/browse.py Outdated Show resolved Hide resolved
opendevin/browser_server/browser_process.py Outdated Show resolved Hide resolved
opendevin/browser_server/browser_process.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rbren rbren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along great!

)

text_content = html2text.html2text(flatten_dom_to_str(obs['dom_object']))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to make this conversion optional--I could see some agents wanting the raw HTML. Also, it's not necessarily an HTML URL! Maybe we add a format param which could be one of html, text, or screenshot?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous impl. of this action is using the text content of the playwright page as content, so now I am keeping this conversion as default. I agree we should add configurable observation here, but up to the agent implementation. As a result, I modified the BrowserObservation towards a more complete one, having all kinds of observation types if the agent wants to use them.

@frankxu2004
Copy link
Collaborator Author

Updated html and text handling. Default to using text as content for this PR, to maintain the same observation as previous versions

@frankxu2004 frankxu2004 requested a review from rbren May 1, 2024 13:17
@codecov-commenter
Copy link

codecov-commenter commented May 1, 2024

Codecov Report

Attention: Patch coverage is 53.06122% with 46 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@0d77f49). Click here to learn what that means.

Files Patch % Lines
opendevin/browser/browser_env.py 43.42% 43 Missing ⚠️
opendevin/action/browse.py 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1452   +/-   ##
=======================================
  Coverage        ?   60.88%           
=======================================
  Files           ?       85           
  Lines           ?     3710           
  Branches        ?        0           
=======================================
  Hits            ?     2259           
  Misses          ?     1451           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@frankxu2004
Copy link
Collaborator Author

frankxu2004 commented May 1, 2024

It seems like it's failing on MacOS but passing on Linux, but interestingly failing at the sandbox connection part. Any ideas why this might be? cc @xingyaoww
https://github.com/OpenDevin/OpenDevin/actions/runs/8909840159/job/24468062273

@xingyaoww
Copy link
Collaborator

xingyaoww commented May 1, 2024

Got this weird error while running it on the frontend (make run) with MonologueAgent - any idea what might cause it?

image

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Amazing work!

I've tested on both Linux and my local MacOS environment and can confirm it work as intended.

@xingyaoww xingyaoww merged commit 836864f into OpenDevin:main May 2, 2024
22 checks passed
@frankxu2004 frankxu2004 deleted the integrate_browsergym branch May 3, 2024 02:43
@@ -22,7 +22,8 @@ uvicorn = "*"
types-toml = "*"
numpy = "*"
json-repair = "*"
playwright = "*"
browsergym = "*" # integrate browsergym as the browsing interface
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand you only need a dependency on browsergym-core.

browsergym is a meta-package that includes additional benchmarks with extra dependencies, browsergym-webarena, browsergym-workarena and browsergym-miniwob.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gasse Thanks for the comment! We are also interested in evaluating agents in the OpenDevin platform on web agent benchmarks such as WebArena, WorkArena and Miniwob++, so we might just leave it as-is for now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Evaluation] Integrate BrowserGym for agent web browsing evaluation
6 participants