CISC 372/3.0 Advanced Data Analytics
Original Author: David Skillicorn
Last Revised: 2019-03-20
Inductive modelling of data, especially counting models; ensemble approaches to modelling; maximum likelihood and density-based approaches to clustering, visualization. Applications to non-numeric datasets such as natural language, social networks, Internet search, recommender systems. Introduction to deep learning. Ethics of data analytics.
CISC 371/3.0 and 3.0 units of (STAT or STAT_Options)
Learning hours: 120 (36L; 84P)
This course is required for the
focus of the COMP degree plan.
This course is a direct prerequisite to:
- CISC 451/3.0 (Topics in Data Analytics)
Zaki and Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.
- Data analytics as an epistemology, review of optimisation-based prediction and clustering, ethical issues in data analytics (1 week)
- Assessing model quality (accuracy, F1 score, precision, recall, ROC, AUC) (1 week)
- Predictors based on counting: decision trees, rule systems (2 weeks)
- Bias, variance, ensemble techniques (random forests, xgboost, bagging, boosting) (2 weeks)
- Visualization (1 week)
- Data analytics for graph data (Internet search, recommender systems) (2 weeks)
- Social network analysis (1 week)
- Natural language analytics (1 week)
- Introduction to deep learning (1 week)