Multiagent Learning; Dynamic Games & Applications
Ph.D. thesis abstract
Daniel Hennes
Promotor: Prof.dr. G. Weiss
Co-promotores: Dr. K.P. Tuyls; prof.dr. K. Turner, Oregon State University
Date of defense: May 16, 2013
More and more contemporary technological challenges require
decentralized solutions. Examples include the coordination of multiple
autonomous vehicles and automated control systems for air traffic. The
demand for multiagent systems is widespread and originates from a
collection of significant advantages. Multiagent systems have built-in
redundancies that allow for robustness, i.e., if one or a few
components fail the system is still able to operate properly. In
addition, different tasks can be assigned to independent agents
pursuing these goals in parallel. Multiagent systems are modular per
definition and components can be easily added or removed. While a
multiagent approach offers an elegant solution paradigm to issues like
fault tolerance and load balancing, it also introduces new challenges.
The presence of multiple agents results in a highly dynamic and
nondeterministic environment. Interactions between agents and with the
environment inevitably cause very complex systems, which are not
trivial to control. In order to facilitate adaptation to changes in
the environment and other agents, learning is crucial.
This dissertation investigates the dynamics of learning in multiagent
systems and is organized in two parts. The first part investigates
evolutionary game theory as a formal framework to model the dynamics
of multiagent learning. The second part focuses on the application of
these models to real world problems. Part I commences with the formal
concepts and key results of multiagent learning. We define the
fundamental models of single- and multiagent learning and give an
introduction to two well-known reinforcement learning algorithms,
Q-learning and learning automata. Subsequently, we survey the most
important multiagent learning algorithms. We continue by explaining
the relation between evolutionary game theory and multiagent
reinforcement learning. In particular, we show the formal relation
between games of learning automata and the asymmetric continuous time
replicator dynamics, as well as the link between Q-learning and the
selection-mutation dynamics. This dissertation makes two important
contributions to the methodology of modeling multiagent reinforcement
learning by evolutionary processes. First, we prove that a large
collective of learning agents can be approximated by the symmetric
continuous time replicator dynamics. Second, we extend the link
between multiagent reinforcement learning and evolutionary game theory
to multi-state games.
Part II of this dissertation focuses on the dynamics of multiagent
learning in real world domains. In particular, we investigate three
domains: complex negotiation scenarios, markets, and collision
avoidance. Complex negotiations involve strategic interactions in
settings that include multiple agents and multiple actions as well as
interrelated goals, tasks and resources. As the complexity of these
large-scale strategic interactions grows exponentially with the number
of agents, we introduce metastrategies as a method to reduce the
strategy space significantly. We show that we can preserve many of the
strategic characteristics in the reduced setting, which makes it
possible to use population games to model the dynamics of such
interactions. The competitive advantage of price signal information
for traders in auctions is the subject of the second domain study.
More information does not guarantee higher performance, in particular,
traders with limited information perform below market average and are
outperformed by random traders; only insiders beat the market. We
analyze the market dynamics with an evolutionary model of agents with
competing information levels. Results show that the highest
information level will dominate if information comes for free. If
information is costly, less-informed traders may prevail reflecting a
more realistic distribution over information levels. In the third
domain, we perform an evolutionary analysis of collision avoidance in
multiagent systems based on the velocity obstacle paradigm. The
velocity obstacle is a geometric representation of all velocities that
will eventually result in a collision given that all dynamic obstacles
maintain their observed velocities. Three variants of the velocity
obstacle description are compared and the most robust technique under
evolutionary pressure is identified.
To conclude, this dissertation contributes to the understanding of
multiagent learning dynamics in several ways. First, we provide an
alternative interpretation of the single population replicator
dynamics. In particular, we derive that if fitness proportionate
selection is used, the overall population average of a population of
learners moves independently of the adaptation process of individuals;
it follows the symmetric continuous time replicator dynamics. Second,
we derive a dynamical model of multi-state multiagent learning. The
state-coupled replicator dynamics extend the existing evolutionary
framework of multiagent learning and allow to predict learning
dynamics in stochastic games. Finally, we show how evolutionary models
of learning can be used to analyze complex scenarios and predict
behavior, thereby linking the theoretical concepts to everyday
problems.