ITCS 6163/8163: Data Warehousing (Spring 2012)

Meetings
When: 5-6:15pm, Monday & Wednesday
Where: 135 Woodward Hall

Staff
Instructor: Wensheng Wu
Office: Woodward Hall 430E
Phone: (704)687-7022
Email: w.wu AT uncc DOT edu
Office hours: 4-4:45pm, Monday & Wednesday

TA: Tingting Zhong
Email: tzhong AT uncc DOT edu
Office hours: 2-3pm, Tuesday & Thursday
Office: Woodward Hall 402 (KDD Lab)

Announcements
4/30: Reminder: final exam on 5/7, 5-7:30pm, same room, closed-book.
4/30: Solution to homework #5 posted
4/18: Please send your preferences for project presentation & demo to TA by Apr. 20.
4/18: Homework #5 deadline postponed to Apr. 25.
posted, due on Apr. 23.

3/25: Solution to hw3 and hw4 posted.
3/19: Reminder: midterm 2 on Mar. 28, closed-notes, closed-book, calculator allowed. Cover lectures 11 to this Wed's (classification).
3/19: The deadline for Phase 3 of project is extended to Apr. 2.
posted(will be officially announced on Monday), due on Mar. 23, Friday to TA (1-3pm, KDD lab).

3/12: Deadline for Homework #3 extend to Mar. 19.
2/29: Homework #3 posted, due on Mar. 14.
2/16: Reminder: midterm 1 on 2/20, in class, closed-notes, closed-book, calculator allowed.
2/16: Solution to homework #2 posted.
2/3: Homework #2 posted, due on Feb. 15.
posted.

1/23: Homework #1 posted, due on Feb. 1.
1/17: Project details up in moodle.
1/9: Welcome to ITCS 6163/8163.

Synopsis


Study concepts, foundation, implementation, and applications of data warehousing. Course consists of three parts. (1) Warehousing: topics include data cleaning, integration, transformation and reduction, multi-dimensional data model, on-line analytic processing, warehouse architecture, relational & multidimensional OLAP servers, efficient computation of data cubes, and best practices in building data warehouses. (2) Mining: unsupervised and supervised learning algorithms for discovering knowledge from warehouses. Topics include association rule mining, decision tree, naive Bayes classifier, k-means clustering, and hierarchical clustering algorithms. (3) Advanced topics: column store, virtual data integration, entity resolution, and other emerging topics. Students will gain hand-on experiences on using warehousing products such as Microsoft SQL Server and are expected to complete a warehousing project.

Texts
Software
Prerequisite
ITCS 6160 (Database Systems) or equivalent

Course structure
Homework assignments: 20%
Midterm: 30% (two, 15% each)
Project: 15%
Final: 30%
Participation: 5%

Project
Details have been posted to
moodle.


Grading
A: 90--100% of total points
B or better: 80--89% of total points
C or better: 70--79% of total points
D or better: 60--69% of total points

Policy
Please submit your homework & project deliverables on time. Late submission will NOT be accepted. All grades will be considered to be FINAL two weeks after being posted. You should complete your homework independently. But your project is expected to be a team work. Please observe the normal codes on academic integrity as stated in the University policy and student handbook.

Schedule (tentative, might be revised as the class progresses)
Date Lecture Topic Recommended Reading Slide Assignment Project Note
1/9 1 Introduction HK Chapter 1 lecture0102-intro.pdf      
1/11 2 Introduction HK Chapter 1 lecture0102-intro.pdf      
1/16 Martin Luther King day, no class
1/18 3 Introduction, Tutorial HK Chapter 1        
1/23 4 Data Analysis HK Chapter 2 lecture0405-data-analysis.pdf   Homework #1 out  
1/25 5 Data Analysis HK Chapter 2 lecture0405-data-analysis.pdf   Phase 0: Group & Domain Information Due  
1/30 6 Object Similarity HK Chapter 2 lecture06-object-similarity.pdf      
2/1 7 ETL HK Chapter 3 lecture0708-etl.pdf   Homework #1 in  
2/6 8 Data Cleaning & Data Reduction HK Chapter 3 lecture0708-etl.pdf      
2/8 9 OLAP & Multidimensional Data Model HK Chapter 4 lecture0910-olap1.pdf      
2/13 10 Ranking & Windowing Functions HK Chapter 4 lecture0910-olap1.pdf      
2/15 11 OLAP Query Processing HK Chapter 4
lecture1112-olap2.pdf  

Phase 1: Warehouse Design Due

Homework #2 in

 
2/20 Midterm 1
2/22 12 Index Structures, & Materialized Views HK Chapter 4
lecture1112-olap2.pdf      
2/27 Midterm 1 review
2/29 13 Cube Computation HK Chapter 5
lecture1314-cube.pdf   Homework #3 out  
3/5 Spring recess, no class
3/7
3/12 14 Cube Computation HK Chapter 5
lecture1314-cube.pdf   Phase 2: ETL Due  
3/14 15 Cube Computation HK Chapter 5
       
3/19 16 Classification HK Chapter 8
TSK Chapter 4 & Section 5.3
lecture1617-class.pdf  

Homework #3 in

Homework #4 out

 
3/21 17 Classification HK Chapter 8
TSK Chapter 4 & Section 5.3
lecture1617-class.pdf   Homework #4 in (on 3/23)  
3/26 18 Clustering HK Chapter 10
TSK Chapter 8
lecture1819-cluster.pdf      
3/28 Midterm 2
4/2 19 Clustering HK Chapter 10
TSK Chapter 8
lecture1819-cluster.pdf   Phase 3: OLAP Due  
4/4 Midterm 2 review
4/9 20 Hierarchial clustering HK Chapter 10
TSK Chapter 8

lecture1819-cluster.pdf

lecture1819-cluster.ppt

     
4/11 21 Association Rule Mining HK Chapter 6
TSK Chapter 6
lecture21-assoc1.pdf   Homework #5 out  
4/16 22 Association Rule Mining HK Chapter 6
TSK Chapter 6
lecture22-assoc2.pdf   Phase 4: Mining Due  
4/18 23 Association Rule Mining HK Chapter 6
TSK Chapter 6
lecture22-assoc2.pdf      
4/23 24 Data Integration GUW Sections 21.1-21.4

lecture24-di.pdf

     
4/25 - Project Presentation & Demo -    

Homework #5 in

Phase 5: Demo & Final Report Due

 
4/30 - Project Presentation & Demo -        
5/2 Reading day, no class
5/7 Final exam, 5-7:30pm, same room