New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Benchmarks/evals #63

Open

ErikBjare opened this issue Jan 20, 2024 · 0 comments

Labels

evals

Owner

ErikBjare commented Jan 20, 2024

I did some smaller benchmarks (more like tests, really) and would like to continue with this endeavor to evaluate capabilities and weak spots.

Would also be interesting to test on codegen tasks vs gpt-engineer (see #62), such as the gpt-engineer suite and SWE-bench.

ErikBjare changed the title ~~Benchmarks~~ Benchmarks/evals

ErikBjare added the evals label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment