The goal of the graph data processing (GDP) course is to provide a broad introduction to effective graph algorithms in data science and network analysis.
The course features two modules:
The network science module. Here we look as the analysis of networks from the perspective of network science. The lectures will follow chapters 2, 3, 4, 5, and 10 from Barabasi’s Network Science (NS) book. Some questions we will answer are:
- How are networks formed?
- Can we model them effectively and what network properties can be predicted accurately?
- What is the role of hubs in information spreading and why are epidemics so difficult to contain?
The learning on graphs module. A major effort will be given to show that existing data analysis techniques can be extended (and enhanced) on graphs. To do so, we will present strong mathematical tools based on linear and non-linear graph spectral harmonic analysis. Some key points covered are:
- How can graphs combat the curse of dimensionality?
- Why is the graph spectrum important?
- How to solve graph problems related to unsupervised and supervised learning, recommendation, visualization?
The course is given in the University of Rennes in the fall of 2017. It builds on the EPFL course ‘A Network Tour of Data Science‘ given by Prof. Pascal Frossard and Prof. Pierre Vandergheynst in 2017 (and earlier by Xavier Bresson), as well as on Prof. Barabási’s class on Network Science.
- keywords: learning on graphs, graph-structured data, data science
- prerequisites: Graph theory, signal processing, convex optimization, basic probability theory
- instructor: Dr. Andreas Loukas
- advisor: Prof. Pierre Vandergheynst
Course schedule & syllabus
- 05/12/2017: Introduction (read chapter 2 of the NS book)
- 07/12/2017: Random networks (read chapter 3 of the NS book)
- 12/12/2017: Scale-free networks (read chapter 4 of the NS book)
- 14/12/2017: Network formation models (read chapter 5 of the NS book)
- 19/12/2017: Spreading phenomena and network epidemics (read chapter 10 of the NS book, skip 10.D)
- — Christmas break —
- 09/01/2018: Elements of spectral graph theory (chapters 1 (up to and including 1.4) and 2 (up to and including 2.3) from Fan Chung’s lectures.)
- 11/01/2018: Unsupervised learning with graphs: spectral clustering (read von Luxburg’s tutorial.)
- 16/01/2018: Unsupervised learning with graphs: dimensionality reduction (papers on ISOMAP, Laplacian Eigenmaps, and LLE )
- 18/01/2018: Project presentation
- 23/01/2018: Graph signal processing
- 25/01/2018: Written exam
The course takes place in room E110, building B02B, Beaulieu, Rennes.
Final grade = 0.5 project grade + 0.5 written exam grade.
If you are motivated to learn more, in addition to the project papers, you may consider the following references:
- The “Spectral Graph Theory” chapter from David Spielman’s book.
(to be updated)
In a nutshell: the objective is to grasp, implement, and present the essence of a state-of-the-art paper in the field.
- Group of 2 persons. Each team selects one paper (see list below).
- Understand main concepts and proofs.
- Evaluate proposed algorithm (or provided prediction) with real data.
- Give a 10 minute presentation that illustrates your findings (understanding and experiments)
- An in depth treatment of the core idea is preferred, rather than an exhaustive account – use your judgement (if in doubt, ask)
- Feel free to use available libraries. If a pre-existing algorithm implementation is used, the grading will focus on other aspects of the effort (e.g., depth of proof comprehension, extensive evaluation with multiple datasets, ideas for extension, presentation of related work).
- Be critical!
- Present your findings in a blog style notebook including an easily accessible description of the ideas, proofs, and performed experiments (using Jupyter notebook is suggested).
- Send the code at my email.
List of Papers
Papers will be assigned in a first come first served basis! Leave a comment below to claim yours.
Related to module 1:
- Finding and evaluating community structure in networks (community detection)
- Epidemic Spreading in Real Networks (a more rigorous analysis of epidemic spreading in networks, check also paper 2)
- Maximizing the Spread of Influence through a Social Network (which nodes to target in order to maximize ones influence on the entire network)
- A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs (how to coarsen a graph and application to partitioning large graphs)
Related to module 2:
- ISOMAP: A Global Geometric Framework for Nonlinear Dimensionality Reduction
- LLE: Nonlinear Dimensionality Reduction by Locally Linear Embedding
- Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
Advanced topics (intended for PhD students or the adventurous):
- Graph sparsification by effective resistances (how to make a graph sparser without affecting its spectrum)
- Visualizing Data using t-SNE (how to embed high-dimensional data to 2 or 3 dimensions)
- Sparse inverse covariance estimation with the graphical lasso (how to learn a graph from data)
- Jupyter Notebook HOWTO
- Network data repositories: network repository, pajek, SNAP, Mark Newman’s page, Science on a sphere, UCI machine learning repository, Konect