GitHub - jianzhnie/deep-marl-toolkit: MARLToolkit: The Multi-Agent Rainforcement Learning Toolkit. Include implementation of MAPPO, MADDPG, QMIX, VDN, COMA, IPPO, QTRAN, MAT...

MARLToolkit: The Multi Agent Rainforcement Learning Toolkit

MARLToolkit is a Multi-Agent Reinforcement Learning Toolkit based on Pytorch. It provides MARL research community a unified platform for developing and evaluating the new ideas in various multi-agent environments. There are four core features of MARLToolkit.

it collects most of the existing MARL algorithms widely acknowledged by the community and unifies them under one framework.
it gives a solution that enables different multi-agent environments using the same interface to interact with the agents.
it guarantees excellent efficiency in both the training and sampling process.
it provides trained results, including learning curves and pretrained models specific to each task and algorithm's combination, with finetuned hyper-parameters to guarantee credibility.

Overview

We collected most of the existing multi-agent environment and multi-agent reinforcement learning algorithms and unified them under one framework based on [Pytorch] to boost the MARL research.

The MARL baselines include independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C) are all implemented.

Popular environments like SMAC, MaMujoco, and Google Research Football are provided with a unified interface.

The algorithm code and environment code are fully separated. Changing the environment needs no modification on the algorithm side and vice versa.

Benchmark	Learning Mode	Available Env	Algorithm Type	Algorithm Number	Continues Control	Asynchronous Interact	Distributed Training	Framework
PyMARL	CP	1	VD	5				*
PyMARL2	CP	1	VD	12				PyMARL
off-policy	CP	4	IL+VD+CC	4				off-policy
on-policy	CP	4	IL+VD+CC	1				on-policy
MARL-Algorithms	CP	1	VD+Comm	9				*
EPyMARL	CP	4	IL+VD+CC	10				PyMARL
Marlbenchmark	CP+CL	4	VD+CC	5	✔️			pytorch-a2c-ppo-acktr-gail
MAlib	SP	8	SP	9	✔️			*
MARLlib	CP+CL+CM+MI	10	IL+VD+CC	18	✔️	✔️	✔️	Ray/RLlib

CP, CL, CM, and MI represent cooperative, collaborative, competitive, and mixed task learning modes. IL, VD, and CC represent independent learning, value decomposition, and centralized critic categorization. SP represents self-play. Comm represents communication-based learning. Asterisk denotes that the benchmark uses its framework.

Environment

Supported Multi-agent Environments / Tasks

Most of the popular environment in MARL research has been incorporated in this benchmark:

Env Name	Learning Mode	Observability	Action Space	Observations
LBF	Mixed	Both	Discrete	Discrete
RWARE	Collaborative	Partial	Discrete	Discrete
MPE	Mixed	Both	Both	Continuous
SMAC	Cooperative	Partial	Discrete	Continuous
MetaDrive	Collaborative	Partial	Continuous	Continuous
MAgent	Mixed	Partial	Discrete	Discrete
Pommerman	Mixed	Both	Discrete	Discrete
MaMujoco	Cooperative	Partial	Continuous	Continuous
GRF	Collaborative	Full	Discrete	Continuous
Hanabi	Cooperative	Partial	Discrete	Discrete

Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.

Algorithm

We provide three types of MARL algorithms as our baselines including:

Independent Learning: IQL DDPG PG A2C TRPO PPO

Centralized Critic: COMA MADDPG MAAC MAPPO MATRPO HATRPO HAPPO

Value Decomposition: VDN QMIX FACMAC VDAC VDPPO

Here is a chart describing the characteristics of each algorithm:

Algorithm	Support Task Mode	Need Global State	Action	Learning Mode	Type
IQL	Mixed	No	Discrete	Independent Learning	Off Policy
PG	Mixed	No	Both	Independent Learning	On Policy
A2C	Mixed	No	Both	Independent Learning	On Policy
DDPG	Mixed	No	Continuous	Independent Learning	Off Policy
TRPO	Mixed	No	Both	Independent Learning	On Policy
PPO	Mixed	No	Both	Independent Learning	On Policy
COMA	Mixed	Yes	Both	Centralized Critic	On Policy
MADDPG	Mixed	Yes	Continuous	Centralized Critic	Off Policy
MAA2C	Mixed	Yes	Both	Centralized Critic	On Policy
MATRPO	Mixed	Yes	Both	Centralized Critic	On Policy
MAPPO	Mixed	Yes	Both	Centralized Critic	On Policy
HATRPO	Cooperative	Yes	Both	Centralized Critic	On Policy
HAPPO	Cooperative	Yes	Both	Centralized Critic	On Policy
VDN	Cooperative	No	Discrete	Value Decomposition	Off Policy
QMIX	Cooperative	Yes	Discrete	Value Decomposition	Off Policy
FACMAC	Cooperative	Yes	Continuous	Value Decomposition	Off Policy
VDAC	Cooperative	Yes	Both	Value Decomposition	On Policy
VDPPO*	Cooperative	Yes	Both	Value Decomposition	On Policy

IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
configs		configs
docs		docs
marltoolkit		marltoolkit
scripts		scripts
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

docs

docs

marltoolkit

marltoolkit

scripts

scripts

.flake8

.flake8

.gitignore

.gitignore

.isort.cfg

.isort.cfg

.pre-commit-config.yaml

.pre-commit-config.yaml

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

MARLToolkit: The Multi Agent Rainforcement Learning Toolkit

Overview

Environment

Supported Multi-agent Environments / Tasks

Algorithm

About

Releases

Packages

Languages

License

jianzhnie/deep-marl-toolkit

Folders and files

Latest commit

History

Repository files navigation

MARLToolkit: The Multi Agent Rainforcement Learning Toolkit

Overview

Environment

Supported Multi-agent Environments / Tasks

Algorithm

About

Topics

Resources

License

Stars

Watchers

Forks

Languages