Master Degree in Artificial Intelligence and Computer Science - UNICAL

Data Warehouse and Visualization

Prof. Giorgio Terracina

A.A. 2023/2024

(last update: 15/06/2023)

 

(Only for students enrolled before 2022-2023): Recall that in order to finalize the exam you need to pass the Machine Learning module too. In order to proceed with the registration, when you complete both modules, you must contact Prof. Terracina.

Exam: Project and discussion.  

The Project is individual. No group-work will be allowed. The aim of the project is to develop a small-sized data warehouse and visual analytics, using the tools presented during the course (Penthao and Tableau) or tools of choice from the student.

Requirements for the development of the project:

 

STEPS

Source selection (Analysis and reconciliation, Rough workload definition)

-          Minimum requirements: Source must allow to get at least three dimensions, at least two hierarchies, at least two measures

Data Cleaning Tasks (implement some forms of cleaning on the data)

Reconciled level design

Design of at least 1 DFM Fact schema

Warehouse level design

ETL – refresh

ETL – update (optional step)

Design of at least 4 Analysis Sheets

Design of at least 1 Dashboard

Here you can find a detailed checklist for developing the project.

CHECKPOINTS (MANDATORY)

You need to go through the following checkpoints with the professor before the final discussion (some checkpoints dates may be fixed during the semester):

-          Checkpoint 1. Source selection and workload definition: must be validated with the professor before going to the next steps

-          Checkpoint 2. Fact schema(s) design. Fact schemas (DFM) must be validated before designing the analysis sheets and dashboards.

-          Checkpoint 3 (at the final discussion). You need to provide a brief essay (around 1 page) summarizing the steps carried out (e.g. which are the sources, which operations have been carried out on them, conceptual design of the DFM, etc.) and the design choices 

Failing to go through such checkpoints may imply the invalidation of the subsequent steps, even if already implemented

Prof. receives: send an e-mail            

Syllabus

 

Additional material will be provided on Teams during the course

 

Additional topics

 

Download the latest version of Penthao at: https://sourceforge.net/projects/pentaho/files/

Data set for data preparation (cleaning.xls)

Important data for Tableau Software and Licence – licence valid for the duration of the course – Licence Key updated for A.A. 2022-2023 (here)

At the end of the course uou can get a free one-year Tableau licence for students at: www.tableau.com/academic/students

Datasets for the first steps on multidimensional data (bugreport, students)

Excercises for Conceptual and Logical Design of Fact schemas (here)

Datasets for Tableau exercises (here)

An exercise with Tableau (here)

Halloween exercise for Visual Analytics (here)

A list of starting pointers for online data sources (here)

Dataset from Florence Consulting Group (here)

 

 

Textbooks:

Italian Version: M. Golfarelli, S. Rizzi, “Data Warehouse – Teoria e pratica della progettazione”, McGraw Hill, Seconda edizione

English Version: M. Golfarelli, S. Rizzi "Data Warehouse Design: Modern Principles and Methodologies", McGraw Hill

Additional Material distributed during the course

 

Suggested Textbook

H. Garcia-Molina, J. D. Ullman, J. Widom, “Database Systems, the complete book”, Prentice Hall, 2002