CISC 474/3.0 Reinforcement Learning

Original Author: Farhana Zulkernine
Last Revised: 2019-03-20

Calendar Description

Formal and heuristic approaches to problem-solving, planning, knowledge representation and reasoning, Markov decision processes, dynamic programming, temporal-difference learning, Monte Carlo learning, function approximation, integration of learning and planning. Implementing simple examples of logical reasoning, clustering or classification.

Prerequisites: CISC 352/3.0; programming expertise

Exclusions: CISC 453*/3.0

Learning hours: 120 (36L; 12G; 72P)

Degree Planning

  • This course satisfies part of the requirements for the Artificial Intelligence focus of the COMP degree plan.

Possible Texts

  • Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. Second Edition, in progress. MIT Press, Cambridge, MA, 2017.


  • Basic concepts; the multi-armed bandit problem (2 weeks)
  • Markov Decision Processes (1 week): Goals, rewards, policies and values.
  • Dynamic Programming (1 week): Policy improvement, policy iteration
  • Monte Carlo Method (1 week): Monte Carlo prediction, estimation of action values, discounting.
  • Temporal Difference Learning (1 week): Predictions and methods.
  • Q Learning (1 week): Example use case scenarios.
  • Planning and Learning (2 weeks): Models and planning, integrated planning, acting and learning, bootstrapping. Planning at decision time, real-time dynamic programming, heuristic search, Monte Carlo tree search
  • Prediction and Approximation (1 week): Value function approximation, prediction objective, stochastic gradient and linear methods.
  • Eligibility Traces and Cognitive Science Aspects (1 week): Eligibility traces, policy gradient, Psychology and Neuroscience aspects of RL,
  • Use Cases and Deep RL (1 week): Introduction to ANN and deep learning.