Artigo Revisado por pares

Self-play reinforcement learning with comprehensive critic in computer games

2021; Elsevier BV; Volume: 449; Linguagem: Inglês

10.1016/j.neucom.2021.04.006

ISSN

1872-8286

Autores

Shanqi Liu, Junjie Cao, Yujie Wang, Wenzhou Chen, Yong Liu,

Tópico(s)

Artificial Intelligence in Games

Resumo

Self-play reinforcement learning, where agents learn by playing with themselves, has been successfully applied in many game scenarios. However, the training procedure for self-play reinforcement learning is unstable and more sample-inefficient than (general) reinforcement learning, especially in imperfect information games. To improve the self-play training process, we incorporate a comprehensive critic into the policy gradient method to form a self-play actor-critic (SPAC) method for training agents to play computer games. We evaluate our method in four different environments in both competitive and cooperative tasks. The results show that the agent trained with our SPAC method outperforms those trained with deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO) algorithms in many different evaluation approaches, which vindicate the effect of our comprehensive critic in the self-play training procedure.

Referência(s)
Altmetric
PlumX