Reinforcement Learning for Adaptive Autonomous Agents in a simulated Multi-Player Game

Key words: Reinforcement Learning, autonomous agent, value function, policy, game

Summary

My Bachelor Thesis deals with the problem of letting an agent learn how to achieve certain goals with minimal initial knowledge about the environment. Under certain assumptions this problem can be formulated as control of an MDP (Markov Decision Process). For environments with reasonably sized state spaces, Reinforcement Learning algorithms such as Q-Learning can be applied. During the learning process, the agent infers by feedback from the environment how well a certain action in a certain situation serves its goals. This knowledge is represented by the numerical Q-function that assigns real values to all pairs (s, a) of states and actions, which is an estimate of the real value of the state-actions available to the agent. Estimates of the Q-function serve as a basis for the policy Pi, a function that selects actions from a given state of the environment.

In an extension to the initial game, a second learning agent whose goals oppose the first agent's, is introduced into the game. Both agents can be trained to act optimally through a modified version of Q-learning called Minimax Q-learning. Like in other Minimax methods, the maximizing agent acts to maximize the minimal value of the successor state, the minimizing agent acts to minimize the maximal value of the successor state.

From a theoretical point of view, my thesis integrates central notions from the Adaptive Autonomous Agents framework (cf. Pattie Maes), Reinforcement Learning and Game Theory.

Here you can

Download the Thesis (pdf format)

Download the RLPac Program (zip archive)