Queen's School of Computing

CISC251/3.0 Data Analytics

Calendar description:

Introduction to data analytics; data preparation; assessing performance; prediction methods such as decision trees, random forests, support vector machines, neural networks and rules; ensemble methods such as bagging and boosting; clustering techniques such as expectation-maximization, matrix decompositions, and biclustering; attribute selection.
Recommended: Prior exposure to problem solving in any discipline.
Learning Hours: 120 (36L; 24Lab;60P)
Exclusion: CISC/CMPE 333

Course Outline:

Preliminaries (2 weeks)

  • Data acquisition and preparation
  • Inductive modelling as an epistemology
  • Assessing model performance
(4 weeks)
  • Simple predictors: decision trees, k-nearest neighbour, Naïve Bayes prediction
  • Stronger predictors: random forests, support vector machines, neural networks
  • Ensemble techniques (bagging, boosting)
Clustering (4 weeks)
  • Similarity measures: distance-based, distribution-based, density-based
  • Algorithms: k-means, expectation-maximization, DBScan
  • Metrix decompositions (such as singular value decomposition) and projections
  • Attribute selection techniques
Applications (2 weeks)
  • Applications selected from a variety of domains (natural language, bioinformatics, business)

Learning outcomes:

Upon successful completion of this course, a student will be able to:

  1. Design inductive model building algorithms appropriate for datasets of moderate size and complexity
  2. Evaluate the modelling performance of such algorithms, and the implications for the real-world system that the data describes

Possible Textbooks:

  • Zaki and Meira, Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press.