My Bachelor Thesis deals with
the problem of letting an agent learn how to achieve certain goals with
minimal initial knowledge about the environment. Under certain assumptions
this problem can be formulated as control of an MDP (Markov Decision Process).
For environments with reasonably sized state spaces, Reinforcement Learning
algorithms such as Q-Learning can be applied. During the learning process,
the agent infers by feedback from the environment how well a certain action
in a certain situation serves its goals. This knowledge is represented
by the numerical Q-function that assigns real values to all pairs (s,
a) of states and actions, which is an estimate of the real value of the
state-actions available to the agent. Estimates of the Q-function serve
as a basis for the policy In an extension to the initial game, a second learning agent whose goals oppose the first agent's, is introduced into the game. Both agents can be trained to act optimally through a modified version of Q-learning called Minimax Q-learning. Like in other Minimax methods, the maximizing agent acts to maximize the minimal value of the successor state, the minimizing agent acts to minimize the maximal value of the successor state. From a theoretical point of view, my thesis integrates central notions from the Adaptive Autonomous Agents framework (cf. Pattie Maes), Reinforcement Learning and Game Theory. |

Here you can Download the Thesis (pdf format) Download the RLPac Program (zip archive) |