Queen's School of Computing

CISC 333/3.0 Introduction to Data Mining

Original Author: David Skillicorn
Last Revised: November 06, 2006

Calendar Description

Supervised and unsupervised learning, neural networks, support-vector machines, decision trees, metric-based clustering, distribution-based clustering, rule-based techniques, genetic algorithms. Applications to information retrieval, web mining, customer-relationship management, recommender systems, science and engineering.
Learning Hours: 120 (36L;84P)

Prerequisites: CISC 121/3.0, CISC 203/3.0; a statistics course; a 1st year course in Linear Algebra.

Objectives

The main objective of this course is ensure that students know enough about the algorithms, strengths and limitations of mainstream data-mining techniques that they can use data-mining software appropriately, and can understand the results that are produced. In particular, they should be able to see how to model a real-world problem, choose appropriate algorithms, analyse the results, and explain their implications for the original problem. A smaller objective is to make students aware that not all problems in computing have a single cut-and-dried, correct solution.

Topics
  • Introduction to data mining, algorithmic complexity and information theory. Sample applications.

  • Multivariate data, supervised vs unsupervised learning.

  • Decision trees.

  • Supervised neural networks, introduction to back-propagation.

  • Support vector machines.

  • Metric-based clustering, nearest-neighbour techniques.

  • Distribution-based clustering, Expectation-Maximisation, Autoclass.

  • Unsupervised neural networks, self-organising maps.

  • Rule-based techniques, association rules.

  • Genetic algorithms.

  • Dealing with temporal and spatial data.

  • Applications: information retrieval (latent semantic indexing), web mining (PageRank and HITS algorithms), customer-relationship management, recommender systems, scientific datasets (e.g., astrophysics), engineering datasets (e.g., fluid dynamics).
Possible Texts
  • Dunham, Data Mining: Introductory and Advanced Topics, Prentice-Hall, 2003.

  • Hand, Manilla, and Smyth, Principles of Data Mining, MIT Press 2001.

  • Han and Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.