CISC 474/3.0 Reinforcement Learning
Original Author: Farhana Zulkernine
Last Revised: 2019-03-20
Formal and heuristic approaches to problem-solving, planning, knowledge representation and reasoning, Markov decision processes, dynamic programming, temporal-difference learning, Monte Carlo learning, function approximation, integration of learning and planning. Implementing simple examples of logical reasoning, clustering or classification.
CISC 352/3.0; programming expertise
Learning hours: 120 (36L; 12G; 72P)
This course satisfies part of the requirements for the
focus of the COMP degree plan.
Reinforcement Learning: An Introduction.
Richard S. Sutton and Andrew G. Barto.
Second Edition, in progress.
MIT Press, Cambridge, MA, 2017.
- Basic concepts; the multi-armed bandit problem (2 weeks)
- Markov Decision Processes (1 week): Goals, rewards, policies and values.
- Dynamic Programming (1 week): Policy improvement, policy iteration
- Monte Carlo Method (1 week): Monte Carlo prediction, estimation of action values, discounting.
- Temporal Difference Learning (1 week): Predictions and methods.
- Q Learning (1 week): Example use case scenarios.
- Planning and Learning (2 weeks): Models and planning, integrated planning, acting and learning, bootstrapping. Planning at decision time, real-time dynamic programming, heuristic search, Monte Carlo tree search
- Prediction and Approximation (1 week): Value function approximation, prediction objective, stochastic gradient and linear methods.
- Eligibility Traces and Cognitive Science Aspects (1 week): Eligibility traces, policy gradient, Psychology and Neuroscience aspects of RL,
- Use Cases and Deep RL (1 week): Introduction to ANN and deep learning.