You are viewing the archived site for GLB 2021. To learn more on the latest edition of the workshop, click here.

Overview

Inspired by the conference tracks in the computer vision and natural language processing communities that are dedicated to establishing new benchmark datasets and tasks, we call for contributions that introduce novel ML tasks or novel graph-structured data which have the potential to (i) help understand the performance and limitations of graph representation models on diverse sets of problems and (ii) support benchmark evaluations for various models.

Our previous call for papers can be found here.

Schedule

All the time listed below are in Ljubljana time (Central European Summer Time, UTC+2). The workshop will start at Apr 16, 2021 3:00pm CEST.

Time (UTC+2)	Agenda
3:00-3:10pm	Opening remarks
3:10-3:30pm	Invited talk by Leman Akoglu (20 min): On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights
3:30-4:00pm	Contributed talks (12 min + 3-min Q&A for each): - Reproducible Evaluations of Network Representation Learning Models Using EvalNE - Catastrophic Forgetting in Deep Graph Networks: an Introductory Benchmark for Graph Classification
4:00-4:05pm	Break (5 min)
4:05-4:40pm	Spotlight talks (11 x 3 min)
4:40-5:30pm	Interactive poster session & Break (50 min)
5:30-6:25pm	Panel discussion (55 min): Stephan Günnemann, Yizhou Sun, Jie Tang
6:25-6:30pm	Break (5 min)
6:30-7:10pm	Keynote by Jure Leskovec (40 min): Open Graph Benchmark Large-Scale Challenge
7:10-7:20pm	Closing remarks

Invited Speakers

Jure Leskovec

Stanford University

Open Graph Benchmark Large-Scale Challenge

We first present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are larger than existing graph benchmarks, encompass multiple important graph ML tasks, and cover a diverse range of domains. We then present OGB’s new initiative on a Large-Scale Challenge (OGB-LSC) at the KDD Cup 2021. OGB-LSC provides datasets that represent modern industrial-scale large graphs. We provide dedicated baseline experiments, scaling up expressive graph ML models to the massive datasets. We show that the expressive models significantly outperform simple scalable baselines, indicating an opportunity for dedicated efforts to further improve graph ML at scale.

Leman Akoglu

Carnegie Mellon University

On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights

Abstract: It is common practice of the outlier mining community to repurpose classification datasets toward evaluating various detection models. To that end, often a binary classification dataset is used, where samples from (typically, the larger) one of the classes are designated as the ‘inlier’ samples, and the other class is substantially down-sampled to create the (ground-truth) ‘outlier’ samples. In this study, we identify an intriguing issue with repurposing graph classification datasets for graph outlier detection in this manner. Surprisingly, the detection performance of outlier models depends significantly on which class is down-sampled; put differently, accuracy often “flips” from high to low depending on which of the classes is down-sampled to represent the outlier samples. The problem is notably exacerbated particularly for a certain family of propagation based outlier detection models. Through careful analysis, we show that this issue mainly stems from disparate within-class sample similarity – which is amplified by various propagation based models – that impacts key characteristics of inlier/outlier distributions and indirectly, the difficulty of the outlier detection task and hence performance outcomes. With this study, we aim to draw attention to this (to our knowledge) previously-unnoticed issue, as it has implications for fair and effective evaluation of detection models, and hope that it will motivate the design of better evaluation benchmarks for outlier detection. Finally, we discuss the possibly overarching implications of using propagation based models on datasets with disparate within-class sample similarity beyond outlier detection, specifically for graph classification and graph-level clustering tasks.

Panelists

Stephan Günnemann
Technical University of Munich

Yizhou Sun
University of California, Los Angeles

Jie Tang
Tsinghua University

Accepted Papers

Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings

Benedek A Rozemberczki (The University of Edinburgh); Rik Sarkar (The University of Edinburgh)
Abstract
Abstract: Proximity preserving and structural role-based node embeddings became a prime workhorse of applied graph mining. Novel node embedding techniques are repetitively tested on the same benchmark datasets which led to a range of methods with questionable performance gains. In this paper, we propose Twitch Gamers a new social network dataset with multiple potential target attributes. Our descriptive analysis of the social network and node classification experiments illustrate that Twitch Gamers is suitable for assessing the predictive performance of novel proximity preserving and structural role-based node embedding algorithms.
PDF Code & Datasets
CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer

Xianbin Ye (Jinan University); Ziliang Li (Central University of Finance and Economics); Pengyong Li (Tsinghua University); Jun Wang (Ping An Technology (Shenzhen) Co. Ltd.); Fei Ma (Chinese Academy of Medical Sciences); Zongbi Yi (Chinese Academy of Medical Sciences); Peng Gao (Pingan Healthcare Technology); Guotong Xie (Pingan Healthcare Technology)
Abstract
Abstract: Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark data-set to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery. CandidateDrug4Cancer dataset encompasses multiple most-mentioned 29 targets for cancer, covering 54869 cancer-related drug molecules which are ranged from pre-clinical, clinical and FDA-approved. Besides building the datasets, we also perform benchmark experiments with effective Drug Target Interaction (DTI) prediction baselines using descriptors and expressive graph neural networks. Experimental results suggest that CandidateDrug4Cancer presents significant challenges for learning molecular graphs and targets in practical application, indicating opportunities for future researches on developing candidate drugs for treating cancers.
PDF Code & Datasets
Chickenpox Cases in Hungary: a Benchmark Dataset for Spatiotemporal Signal Processing with Graph Neural Networks

Benedek A Rozemberczki (The University of Edinburgh); Paul M Scherer (University of Cambridge); Oliver Kiss (Central European University); Rik Sarkar (The University of Edinburgh); Tamas Ferenci (Obuda University)
Abstract
Abstract: Recurrent graph convolutional neural networks are highly effective machine learning techniques for spatiotemporal signal processing. Newly proposed graph neural network architectures are repetitively evaluated on standard tasks such as traffic or weather forecasting. In this paper, we propose the Chickenpox Cases in Hungary dataset as a new dataset for comparing graph neural network architectures. Our time series analysis and forecasting experiments demonstrate that the Chickenpox Cases in Hungary dataset is adequate for comparing the predictive performance and forecasting capabilities of novel recurrent graph neural network architectures.
PDF Code & Datasets
Reproducible Evaluations of Network Representation Learning Models Using EvalNE

Alexandru C. Mara (Ghent University); Jefrey Lijffijt (Ghent University); Tijl De Bie (Ghent University)
Abstract
Abstract: In this paper we introduce EvalNE, a Python toolbox for the evaluation of network representation learning methods. The main goal of EvalNE is to help researchers perform consistent and reproducible evaluations of new representation learning methods, compare these with the state-of-the-art or conduct benchmark studies. The toolbox can evaluate models and assess the quality of representations through data visualization and a variety of downstream prediction tasks including sign and link prediction, network reconstruction, and node multi-label classification. EvalNE streamlines evaluation by providing automation and abstraction for tasks such as hyper-parameter tuning and model validation, node and edge sampling, node-pair embedding computation and performance reporting. The framework can also evaluate approaches independently of the programming language and with minimal user interaction. As a command line tool, configuration files describe the evaluation setup and guarantee consistency and reproducibility. As an API, EvalNE provides the building blocks to design any evaluation setup while ensuring that common issues, such as data leakage, are ruled out. Finally, to showcase the capabilities of our tool, we present the results of a recent benchmark on representation learning for link prediction conducted using EvalNE.
PDF Code
Heterogeneous Graph Dataset with Feature Set Intersection through Game Provenance

Sidney Araujo Melo (Institute of Computing / Universidade Federal Fluminense); Esteban Clua (Universidade Federal Fluminense); Aline Paes (Institute of Computing / Universidade Federal Fluminense)
Abstract
Abstract: Provenance graphs have been adapted for digital games and game analytics and proved to be a powerful tool for capturing game session data for complex games. Due to the own game nature, which is composed by large amount and variety of data, game provenance graphs are highly heterogeneous in terms of node and edge types and their associated feature sets. Furthermore, game provenance graphs are rich from intersections across feature sets from distinct node types. However, most existing heterogeneous graph neural network solutions rely on simple approaches to deal with varying node types, such as projecting each type of node to the same n-dimensional space. They assume that node types, and consequently their composing features, to be independently distributed. In this work, we present the Smoke Squadron Dataset, a game provenance graph dataset containing game session graphs whose node types share several common feature subsets. To address this challenging heterogeneity, We propose a feature set based solution that allows projecting distinct node types that leverage feature set intersection.
PDF Code & Datasets
Synthetic Graph Generation to Benchmark Graph Learning

Anton Tsitsulin (University of Bonn); Benedek A Rozemberczki (The University of Edinburgh); John Palowitch (Google Research); Bryan Perozzi (Google Research)
Abstract
Abstract:
Graph learning algorithms have attained state-of-the-art performance on many graph analysis tasks such as node classification, link prediction, and clustering. It has, however, become hard to track the field’s burgeoning progress. One reason is due to the very small number of datasets used in practice to benchmark the performance of graph learning algorithms. This shockingly small sample size (~10) allows for only limited scientific insight into the problem.

In this work, we aim to address this deficiency. We propose to generate synthetic graphs, and study the behaviour of graph learning algorithms in a controlled scenario. We develop a fully-featured synthetic graph generator that allows deep inspection of different models. We argue that synthetic graph generations allows for thorough investigation of algorithms and provides more insights than overfitting on three citation datasets. In the case study, we show how our framework provides insight into unsupervised and supervised graph neural network models.

PDF Code & Datasets
A Simple Yet Effective Method Improving Graph Fingerprints for Graph-Level Prediction

Jiaxin Ying (University of Michigan); Jiaqi Ma (University of Michigan); Qiaozhu Mei (University of Michigan)
Abstract
Abstract: Graph fingerprints form an important group of graph representation methods and have been shown effective in graph machine learning tasks. However, such methods usually first compute feature descriptors of nodes in a graph, and then average or sum over them to obtain a graph-level feature descriptor. The simple pooling methods tend to cause significant information loss in the graph-level representation. Further, the computation of graph fingerprints is mostly not parameterized and cannot be tailored for a supervised learning task. In this paper, we test a simple fuzzy histogram approach that is applicable to a wide range of graph fingerprints. Through extensive benchmark evaluation, we demonstrate that this simple method significantly improves the supervised learning performance with graph fingerprints, and achieves similar performance with popular graph neural networks for graph classification, while remaining computationally efficient. We suggest graph fingerprints enhanced with the histogram approach should be considered as strong baselines in the context of graph-level prediction tasks.
PDF Code
Catastrophic Forgetting in Deep Graph Networks: an Introductory Benchmark for Graph Classification

Antonio Carta (Università di Pisa); Andrea Cossu (University of Pisa); Federico Errica (University di Pisa); Davide Bacciu (University of Pisa)
Abstract
Abstract: In this work, we study the phenomenon of catastrophic forgetting in the graph representation learning scenario. The primary objective of the analysis is to understand whether classical continual learning techniques for flat and sequential data have a tangible impact on performances when applied to graph data. To do so, we experiment with a structure-agnostic model and a deep graph network in a robust and controlled environment on three different datasets. The benchmark is complemented by an investigation on the effect of structure-preserving regularization techniques on catastrophic forgetting. We find that replay is the most effective strategy in so far, which also benefits the most from the use of regularization. Our findings suggest interesting future research at the intersection of the continual and graph representation learning fields. Finally, we provide researchers with a flexible software framework to reproduce our results and carry out further experiments.
PDF Code
Embedding alignment methods in dynamic networks

Piotr Bielak (Wroclaw University of Science and Technology); Kamil Tagowski (Wrocław University of Science and Technology); Tomasz Kajdanowicz (Wroclaw University of Science and Technology)
Abstract
Abstract: In recent years, dynamic graph embedding has attracted a lot of attention due to its usefulness in real-world scenarios. In this paper, we consider discrete-time dynamic graph representation learning, where embeddings are computed for each time window, and then are aggregated to represent the dynamics of graph. However, independently computed embeddings in consecutive windows suffer from the stochastic nature of representation learning algorithms and are algebraically incomparable (affine transformations). We underline the need for embedding alignment process and provide nine alignment techniques evaluated on real-world datasets in link prediction and graph reconstruction tasks. Our experiments show that embedding alignment improves the performance of downstream tasks up to 11 pp compared to the not aligned scenario.
PDF Code
New Benchmarks for Learning on Non-Homophilous Graphs

Derek Lim (Cornell University); Xiuyu Li (Cornell University); Felix M Hohne (Cornell University); Ser-Nam Lim (Facebook AI)
Abstract
Abstract: Much data with graph structures satisfy the principle of homophily, meaning that connected nodes tend to be similar with respect to a specific attribute. As such, ubiquitous datasets for graph machine learning tasks have generally been highly homophilous, rewarding methods that leverage homophily as an inductive bias. Recent work has pointed out this particular focus, as new non-homophilous datasets have been introduced and graph representation learning models better suited for low-homophily settings have been developed. However, these datasets are small and poorly suited to truly testing the effectiveness of new methods in non-homophilous settings. We present a series of improved graph datasets with node label relationships that do not satisfy the homophily principle. Along with this, we introduce a new measure of the presence or absence of homophily that is better suited than existing measures in different regimes. We benchmark a range of simple methods and graph neural networks across our proposed datasets, drawing new insights for further research. Data and codes can be found at https://github.com/CUAI/Non-Homophily-Benchmarks.
PDF Code & Datasets
HyFER: A Framework for Making Hypergraph Learning Easy, Scalable and Benchmarkable

Hyunjin Hwang (KAIST); Seungwoo Lee (Korea Advanced Institute of Science and Technology); Kijung Shin (KAIST)
Abstract
Abstract: Interests in hypergraphs, which are a generalization of graphs, have emerged due to their expressiveness. This expressiveness causes some difficulties in applying deep learning techniques to hypergraphs, and a number of Hypergrpah Neural Networks (hyperGNNs) have overcome or bypassed such difficulties in their own way. Up until now, there is no standard way of doing so, and as a consequence, it takes much effort to directly compare different hyperGNNs even with their open-sourced implementation. In order to address this issue, we propose HyFER, an easy-to-use and efficient framework for implementing and evaluating hyperGNNs. Using HyFER, which is well modularized for easy adaptation to new datasets, models, and tasks, we could directly compare three hyperGNNs in two tasks on four datasets under the same settings.
PDF Code
TRIGGER: TempoRal Interaction Graph GenEratoR

Yusuf Ozkaya (Georgia Institute of Technology); Ali Pinar (Sandia National Laboratories); Ümit V. Çatalyürek (Georgia Institute of Technology)
Abstract
Abstract:
Efforts on temporal graph generation have focused on generating instances from the same steady state (e.g., keeping a fixed-size window over a sequence of edges generated from the same model). Unfortunately, such generators cannot capture the underlying information richness of the temporal aspects of activity graphs. Based on the underlying phenomena being represented by the graph, temporal properties of interest will vary. In addition to topological features, such as neighbor information, we are interested in frequencies of communication. Subsequently, our work can be split into two natural steps: building models that can represent the temporal characteristics of nodes of a graph and generating temporal activity graphs that display the features of our model.

We present TRIGGER (TempoRal Interaction Graph GenEratoR): A Markov Model-based activity generation approach that classifies the nodes into profiles and generates a series of repeating interactions in continuous time. Then, we show how to estimate an input model to represent a real world graph. We carried out extensive experiments to validate our approach using various real-world temporal datasets and metrics on the quality of generated graphs. We show that our approach can generate realistic temporal activity graphs and match temporal metrics such as burstiness, spread, and persistence and static metrics at both graph and the node scale.

PDF Code
A New Benchmark of Graph Learning for PM2.5 Forecasting under Distribution Shift

Yachuan Liu (University of Michigan); Jiaqi Ma (University of Michigan); Paramveer Dhillon (University of Michigan); Qiaozhu Mei (University of Michigan)
Abstract
Abstract: We present a new benchmark task for graph-based machine learning, aiming to predict future air quality (PM2.5 concentration) ob-served by a geographically distributed network of environmental sensors. While prior work has successfully applied Graph Neural Networks (GNNs) on a wide family of spatio-temporal predictiontasks, the new benchmark task introduced here brings a technical challenge that has been less studied in the context of graph-based spatio-temporal learning: distribution shift across a longperiod of time. An important goal of this paper is to understandthe behaviour of spatio-temporal GNNs under distribution shift. Toachieve this goal, we conduct a comprehensive comparative studyof both graph-based and non-graph-based machine learning modelson the proposed benchmark task. To single out the influence ofdistribution shift on the model performances, we design two datasplit settings for control experiments. The first setting splits thedata naturally by the order of time, while the second setting assignsall the time stamps randomly into training, validation, and test sets, which removes the effect of distribution shift. Our empirical resultssuggest that GNN models tend to suffer more from distributionshift compared to non-graph-based models, which calls for specialattention when deploying spatio-temporal GNNs in practice.
PDF Code & Datasets

Organizers

Jiaqi Ma (University of Michigan)
Jiong Zhu (University of Michigan)
Yuxiao Dong (Facebook AI)
Danai Koutra (University of Michigan)
Qiaozhu Mei (University of Michigan)

Program Committee

Aleksandar Bojchevski (Technical University of Munich)
Andreas Loukas (EPFL)
Anton Tsitsulin (University of Bonn)
Christopher Morris (TU Dortmund University)
Daniel Zügner (Technical University of Munich)
Davide Belli (University of Amsterdam)
Davide Mottin (Aarhus University)
Johannes Klicpera (Technical University of Munich)
Marinka Zitnik (Harvard University)
Mark Heimann (Lawrence Livermore National Laboratory)

Matthias Fey (TU Dortmund University)
Michael Schaub (RWTH Aachen University)
Neil Shah (Snap Inc.)
Tara Safavi (University of Michigan)
Thomas Kipf (Google Brain)
Wei Ai (University of Maryland)
Wenzheng Feng (Tsinghua University)
Yewen Wang (UCLA)
Yujun Yan (University of Michigan)

Workshop on Graph Learning Benchmarks (GLB 2021)

Overview

Schedule

Invited Speakers

Jure Leskovec

Leman Akoglu

Panelists

Accepted Papers

Organizers

Program Committee

Workshop on Graph Learning Benchmarks
(GLB 2021)