CISC 333/3.0 Introduction to Data Mining
Original Author: David Skillicorn
Last Revised: November 06, 2006
Supervised and unsupervised learning, neural networks, support-vector machines, decision trees, metric-based clustering, distribution-based clustering, rule-based techniques, genetic algorithms. Applications to information retrieval, web mining, customer-relationship management, recommender systems, science and engineering.
Learning Hours: 120 (36L;84P)
Prerequisites: CISC 121/3.0, CISC 203/3.0; a statistics course; a 1st year course in Linear Algebra.
The main objective of this course is ensure that students know enough about
the algorithms, strengths and limitations of mainstream data-mining
techniques that they can use data-mining software appropriately, and
can understand the results that are produced. In particular, they should
be able to see how to model a real-world problem, choose appropriate
algorithms, analyse the results, and explain their implications for
the original problem. A smaller objective is to make students aware
that not all problems in computing have a single cut-and-dried, correct
- Introduction to data mining, algorithmic complexity and information theory. Sample applications.
- Multivariate data, supervised vs unsupervised learning.
- Decision trees.
- Supervised neural networks, introduction to back-propagation.
- Support vector machines.
- Metric-based clustering, nearest-neighbour techniques.
- Distribution-based clustering, Expectation-Maximisation, Autoclass.
- Unsupervised neural networks, self-organising maps.
- Rule-based techniques, association rules.
- Genetic algorithms.
- Dealing with temporal and spatial data.
- Applications: information retrieval (latent semantic indexing), web mining (PageRank and HITS algorithms),
customer-relationship management, recommender systems, scientific datasets (e.g., astrophysics),
engineering datasets (e.g., fluid dynamics).
- Dunham, Data Mining: Introductory and Advanced Topics, Prentice-Hall, 2003.
- Hand, Manilla, and Smyth, Principles of Data Mining, MIT Press 2001.
- Han and Kamber, Data Mining: Concepts and Techniques, Morgan