We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What problem or use case are you trying to solve?
As SWE-Bench is going to be added soon, I think it'd be good to move our evaluation codebase forward to include BrowserGym for everything web browser!
They already support:
With more work going to be added to that benchmark soon!
cc @neubig @frankxu2004
Describe the UX of the solution you'd like
Do you have thoughts on the technical implementation?
BrowserGym
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Re-open since the evaluation is not added yet.
Sorry, something went wrong.
We have a separate issue for evaluation #1471 , so maybe it can be closed?
ohhh good point!
Successfully merging a pull request may close this issue.
What problem or use case are you trying to solve?
As SWE-Bench is going to be added soon, I think it'd be good to move our evaluation codebase forward to include BrowserGym for everything web browser!
They already support:
With more work going to be added to that benchmark soon!
cc @neubig @frankxu2004
Describe the UX of the solution you'd like
Do you have thoughts on the technical implementation?
BrowserGym
's agent implementation as suggested by @frankxu2004Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: