Master Degree in Artificial Intelligence and
Computer Science - UNICAL
Data Warehouse and Visualization
Prof. Giorgio Terracina
A.A. 2023/2024
(last update: 15/06/2023)
(Only for students
enrolled before 2022-2023): Recall that in order to finalize the exam you
need to pass the Machine Learning module too. In order to proceed with the
registration, when you complete both modules, you must contact Prof.
Terracina.
Exam: Project
and discussion.
The Project
is individual. No group-work will be allowed. The aim of the project is to
develop a small-sized data warehouse and visual analytics, using the tools
presented during the course (Penthao and Tableau) or
tools of choice from the student.
Requirements
for the development of the project:
STEPS |
Source
selection (Analysis and reconciliation, Rough workload definition) -
Minimum requirements: Source must allow to get at least three
dimensions, at least two hierarchies, at least two measures |
Data
Cleaning Tasks (implement some forms of cleaning on the data) |
Reconciled
level design |
Design of
at least 1 DFM Fact schema |
Warehouse
level design |
ETL –
refresh |
ETL –
update (optional step) |
Design of
at least 4 Analysis Sheets |
Design of
at least 1 Dashboard |
Here you can find a detailed
checklist for developing the project.
CHECKPOINTS (MANDATORY)
You need to go through the following
checkpoints with the professor before the final discussion (some checkpoints
dates may be fixed during the semester):
- Checkpoint
1. Source
selection and workload definition: must be validated with the professor before
going to the next steps
- Checkpoint
2. Fact
schema(s) design. Fact schemas (DFM) must be validated before designing the
analysis sheets and dashboards.
- Checkpoint
3 (at the final discussion). You
need to provide a brief essay (around 1 page) summarizing the steps carried out
(e.g. which are the sources, which operations have been carried out on them,
conceptual design of the DFM, etc.) and the design choices
Failing to go through such checkpoints may
imply the invalidation of the subsequent steps, even if already implemented
Prof. receives: send an e-mail
Syllabus
Additional material will be provided on Teams during
the course
Additional
topics
Download
the latest version of Penthao at: https://sourceforge.
Data
set for data preparation (cleaning.xls)
Important
data for Tableau Software and Licence – licence valid for the duration of the
course – Licence Key updated for A.A. 2022-2023 (here)
At the end of the course uou can get a free one-year
Tableau licence for students at: www.tableau.com/academic/students
Datasets for the first steps on multidimensional data (bugreport,
students)
Excercises for Conceptual and Logical Design
of Fact schemas (here)
Datasets for Tableau exercises (here)
An exercise with Tableau (here)
Halloween exercise for Visual Analytics (here)
A list of starting pointers for online data sources (here)
Dataset
from Florence Consulting Group (here)
Textbooks:
Italian Version: M. Golfarelli,
S. Rizzi, “Data Warehouse – Teoria
e pratica della progettazione”, McGraw Hill, Seconda
edizione
English Version: M. Golfarelli,
S. Rizzi "Data Warehouse Design: Modern
Principles and Methodologies", McGraw Hill
Additional Material distributed during the course
Suggested
Textbook
H. Garcia-Molina, J. D. Ullman, J. Widom,
“Database Systems, the complete book”, Prentice Hall, 2002