Programme

 

09:00-09:15    Welcome
09:15-10:00    "End-to-end Hierarchical Reinforcement Learning" by Herke van Hoof
10:00-10:15    "A fast hybrid reinforcement learning framework with human corrective feedback" by Carlos Celemin
10:15-10:30    Flash talks (Posters 1-7)
10:30-11:00    Coffee break (and posters)


11:00-11:45    "A Pure Exploration perspective on Game Tree Search" by Wouter Koolen
11:45-12:00    "Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies" by Pieter Libin
12:00-12:15    Flash talks (Posters 8-14)
12:15-13:30    Lunch break (and posters)

- - -

13:30-14:15    "Learning from demonstrations" by Tim Salimans
14:15-14:30    "Attention Solves Your TSP, Approximately" by Wouter Kool
14:30-14:45    "Learning System-Efficient Equilibria in Route Choice Using Tolls" by Gabriel de Oliveira Ramos
14:45-15:30    Coffee break (and posters)


15:30-15:45    "TDRL Emotions" by Joost Broekens
15:45-16:00    "Stochastic Activation Actor-Critic Methods" by Wendy Shang
16:00-16:15    "Reinforcement Learning in Spiking Neural Networks" by Sander Bohte
16:15-16:30    "Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems" by Eugenio Bargiacchi
16:30-17:30    Discussion groups


17:30-18:30    Drinks

- - -

18.30-19.00    Walk to restaurant
19:00-21.30    Dinner in town (optional, at own expense)

 

 

 

Invited Talks

 

End-to-end Hierarchical Reinforcement Learning - Herke van Hoof, Universiteit van Amsterdam

Hierarchical reinforcement learning is based on the idea that a complex tasks can be split into simpler subtasks each solved by a sub-policy. Learning hierarchical strategies has the promise to allow temporal abstraction (allowing larger-scale learning) and modularity (allowing transfer between tasks). In the last two decades, much work has been done on hierarchical reinforcement learning, especially within the options framework. More recently, attention has focused on how to learn hierarchical policies end-to-end. End-to-end learning means the low-level sub-behaviors as well as the high-level strategy to sequence sub-behaviors are learned at the same time, from the same data, and with a single objective. In this talk, I will discuss our proposed methods for end-to-end learning of hierarchical policies that can be trained on off-policy data.

 
A Pure Exploration perspective on Game Tree Search - Wouter Koolen, Centrum voor Wiskunde en Informatica
We take a look at the connections between Reinforcement Learning (RL) and Pure Exploration (PE) problems. In RL the goal is to maximize long-term reward in an unknown environment. In PE the goal is to maximize information about the environment in a specific sense, without attention to reward/cost. Obtaining high reward in RL requires targeted collection of the relevant information, and hence PE systems naturally serve as sub-modules in RL systems. We will sketch a spectrum of connections, from the simple tabular case to MCTS in AlphaGo. We then discuss ongoing research in PE methods for Game Tree Search, where we will see interesting and different methods emerge with promising future application in RL.
 

Learning from demonstrations - Tim Salimans, Google Brain

We propose a new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge. Instead of imitating human demonstrations, our approach is to maximize rewards directly. Our agent is trained using off-the-shelf reinforcement learning, but starts every episode by resetting to a state from a demonstration. By starting from such demonstration states, the agent requires much less exploration to learn a game compared to when it starts from the beginning of the game at every episode. We analyze reinforcement learning for tasks with sparse rewards in a simple toy environment, where we show that the run-time of standard RL methods scales exponentially in the number of states between rewards. Our method reduces this to quadratic scaling, opening up many tasks that were previously infeasible. We then apply our method to Montezuma's Revenge, for which we present a trained agent achieving a high-score of 74,500, better than any previously published result.

 

 

 

Posters

 

1. "Simultaneous Action Learning and Grounding through Reinforcement and Cross-Situational Learning" by Oliver Roesler
2. "Personalization of Health Interventions using Cluster-Based Reinforcement Learning" by Ali el Hassouni 
3. "Learning controllers for drones and mobile robots" by Javier Alonso Mora
4. "Stable, Practical and On-line Bootstrapped Conservative Policy Iteration" by Denis Steckelmacher
5. "Can we use brain-based feedback to identify the semantic concept on your mind using RL?" by Karen Dijkstra
6. "Intra-day Bidding Strategies for Storage Devices Using Deep Reinforcement Learning" by Ioannis Boukas
7. "Coordinating Human and Agent Behavior in Collective-Risk Scenarios" by Elias Fernández Domingos
8. "From Algorithmic Black Boxes to Adaptive White Boxes: Declarative Decision-Theoretic Ethical Programs as Codes of Ethics" by  Martijn van Otterlo
9. "Interactive Reinforcement learning to reduce the total solution space of an assembly task" by Joris De Winter
10. "Achieving scalable model-free demand response in charging an electric vehicle fleet with reinforcement learning" by Chris Develder
11. "Large-scale vehicle routing (uses no RL)" by Michal Cap
12. "Solution horizons in non-stationary MDPs" by Grigory Neustroev 
13. "Safe Reinforcement Learning in Factored MDPs" by Thiago Dias Simao  
14. "Monte Carlo Tree Search for Asymmetric Trees" by Thomas Moerland