Queen's School of Computing

CISC432/3.0 Advanced Data Management Systems

Calendar description:

Storage and representation of “big data”, which are large, complex, structured or unstructured data sets. Provenance, curation, integration, indexing and querying of data.
Prerequisites: C- in CISC 235 and CISC 332
Learning Hours: 120 (36L;84P)

Course Outline:

Introduction (2 week)

  • How "big data" is different from conventional databases
  • Relation to cloud computing, data science, and Internet of Things
  • Overview of representational and organizational issues
Big Data Storage Systems (3 weeks)
  • Current technologies such as HDFS, SQL, NoSQL, Graph and hybrid storage systems
  • Management and query of structured, semi-structured, unstructured, and hybrid data
Processing Frameworks (4 weeks)
  • Online Transaction Processing (OLTP), Online Analytic Processing (OLAP) and Benchmarking
  • Distributed data processing, eg MapReduce, Hadoop
  • In-memory processing e.g. Spark
  • Cloud-based information systems
Curation, Workflows, and Provenance (3 weeks)
  • Representation of the meaning of data and metadata
  • Creating, scheduling, executing, and managing distributed data processing workflows
  • Origin and history of data sets

Learning outcomes:

Upon successful completion of this course, a student will be able to:

  1. Create distributed storage structures for complex datasets
  2. Organize, integrate and process data from distributed storage systems
  3. Create metadata for complex datasets
  4. Articulate issues in data provenance and curation
  5. Build workflows and query the results

Possible Textbooks:

As this course is meant to remain current with latest developments, it will be based on course notes and online resources.