Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

larger model, worse peformance? #30

Open
WSPeng opened this issue Mar 26, 2024 · 5 comments
Open

larger model, worse peformance? #30

WSPeng opened this issue Mar 26, 2024 · 5 comments

Comments

@WSPeng
Copy link

WSPeng commented Mar 26, 2024

hi, the leader board shows arger model, worse peformance, is it because of the inference time? smaller model have high action frequency. if so, the bench is not very useful.

i think maybe change the game so it can pause, then we can compare models without bias on inference latency.

@StanGirard
Copy link
Member

The goal here is to evaluate an LLM in realtime. We give them the ability to make 3-5 moves ahead of time. Large LLMs can generate more move but yes they take longer.

The goal is to have that inference latency but we could add an option to remove this with a parameter for some games.

Please feel free to open a PR to put this into place but optionnaly and not by default ;)

@taozhiyuai
Copy link

in my experience, yes. small model has high token/second, always generate actions. while big model waits for tokens to know how to re-act. @_@

@taozhiyuai
Copy link

The record show small model can generate more actions with high token/second

0.5b wins 3 rounds!

Player 1 using: ollama:qwen:14b-chat-v1.5-fp16
Player 2 using: ollama:qwen:0.5b-chat-v1.5-fp16

Round 1

🏟️ (0647) (0)Starting game
🏟️ (0647) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-30 12:20:26.448 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Evaluate Opponent', 'Assess Distance for Effective Attacks']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: super attack 2
Player 2 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: medium kick
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: fireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: fireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 2
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump away
Player 1 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low punch
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: low punch
Player 2 move: low punch
2024-03-30 12:21:41.329 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Mid Punch', 'Mid Punch']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
🏟️ (0647) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megapunch
Player 1 move: hurricane
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

—————————

round 2

🏟️ (2b8a) (0)Starting game
🏟️ (2b8a) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump away
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megapunch
Player 1 move: low punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high punch
Player 2 move: high punch
Player 2 move: high punch
Player 2 move: megapunch
Player 2 move: low punch
Player 2 move: low punch
Player 2 move: low kick
🏟️ (2b8a) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high kick
Player 1 move: megapunch
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

———————

Round 3

🏟️ (b34c) (0)Starting game
🏟️ (b34c) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: medium punch
Player 2 move: super attack 2
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-30 12:28:29.109 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Move Closer to get into better attacking range', 'Megafireball or Super attack 2 as a powerful offensive option while closing in']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: megafireball
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: high punch
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: fireball
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megafireball
Player 1 move: medium punch
Player 1 move: fireball
2024-03-30 12:28:58.413 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Assess the distance to the opponent', 'If close', 'If far', 'Move Clo']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megafireball
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: megapunch
Player 2 move: hurricane
Player 2 move: megafireball
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
🏟️ (b34c) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: super attack 3
Player 2 move: low kick
Player 2 move: high kick
Player 2 move: jump closer
Player 2 move: jump away
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

@taozhiyuai
Copy link

taozhiyuai commented Mar 30, 2024

WechatIMG83

win rate 44% after 50 rounds

@oulianov

@oulianov
Copy link
Contributor

Very interesting results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants