Issues: THUDM/AgentBench
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Would llama3 wizardlm2 and other latest models be tested and published in leaderboard? 请求添加llama3 wizardlm等24年4-5月大模型的测试结果
enhancement
New feature or request
#136
opened May 11, 2024 by
dercaft
[Feature] 请问每个任务的分是怎么计算的呢?比如OS任务中得到的只是一个准确率,但是在论文中Table3每个任务对应的都是分数,这中间的映射过程我在文中并没有找到,可以提示一下吗
enhancement
New feature or request
#135
opened May 10, 2024 by
lonerFarea
请问支持使用openai的tool_call接口进行测试吗?
enhancement
New feature or request
#132
opened Apr 9, 2024 by
Maybewuss
Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence.
enhancement
New feature or request
#130
opened Mar 26, 2024 by
Konisberg
[Bug/Assistance] mind2web的unknown是怎么回事?
bug
Something isn't working
help wanted
Extra attention is needed
#129
opened Mar 24, 2024 by
Tangent-90C
OS std 测试集结果
bug
Something isn't working
help wanted
Extra attention is needed
#128
opened Mar 18, 2024 by
webdxq
[Bug/Assistance] - Reproducing Results on Alfworld (HH) (vs. ReAct paper)
bug
Something isn't working
help wanted
Extra attention is needed
#127
opened Mar 9, 2024 by
ai-nikolai
Benchmark for mistral models
enhancement
New feature or request
#122
opened Mar 1, 2024 by
mingxuan-he
Card_Game这个任务跑不起来
bug
Something isn't working
help wanted
Extra attention is needed
#121
opened Feb 29, 2024 by
yupeijei1997
[Bug/Assistance] 测试kg-std任务时,输出文件中全部状态都是task limit reached
bug
Something isn't working
help wanted
Extra attention is needed
#115
opened Feb 5, 2024 by
13416157913
ltp无法启动
bug
Something isn't working
help wanted
Extra attention is needed
#110
opened Jan 31, 2024 by
Fu-Dayuan
[Bug/Assistance]
bug
Something isn't working
help wanted
Extra attention is needed
#109
opened Jan 27, 2024 by
ibingzhaoi
dbbench-std: Task Output Seems Correct But MD5 Mismatches
bug
Something isn't working
help wanted
Extra attention is needed
#108
opened Jan 24, 2024 by
wchen-github
agentbench 能跑训练集么?
bug
Something isn't working
help wanted
Extra attention is needed
#107
opened Jan 24, 2024 by
Fu-Dayuan
[Bug/Assistance] DBBench Unknown database
bug
Something isn't working
help wanted
Extra attention is needed
#106
opened Jan 18, 2024 by
LittleWhite0208
[Bug/Assistance] os-std某一条数据报错Worker not responding
bug
Something isn't working
help wanted
Extra attention is needed
#105
opened Jan 15, 2024 by
Xccanxin
[Assistance] Need some example running logs
bug
Something isn't working
help wanted
Extra attention is needed
#103
opened Jan 11, 2024 by
ROCKYWWWW
cg和kg都遇到了Worker not responding
bug
Something isn't working
help wanted
Extra attention is needed
#97
opened Jan 3, 2024 by
WarBean
游戏任务启动失败[Assistance]
bug
Something isn't working
help wanted
Extra attention is needed
#96
opened Jan 3, 2024 by
smartliuhw
[Bug/Assistance] DBbench任务评测结果与leaderboard不一致
bug
Something isn't working
help wanted
Extra attention is needed
#89
opened Dec 22, 2023 by
SummerXIATIAN
cg任务没有一条执行成功而且task server没有收到任何信息
bug
Something isn't working
help wanted
Extra attention is needed
#87
opened Dec 18, 2023 by
Jianzhao-Huang
[Bug/Assistance] The option link fails to jump
bug
Something isn't working
help wanted
Extra attention is needed
#85
opened Dec 13, 2023 by
zhimin-z
[Assistance] Number of problems in the OS dataset
bug
Something isn't working
help wanted
Extra attention is needed
#72
opened Nov 8, 2023 by
deema-A
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.