You are viewing the archived site for GLB 2022. To learn more on the latest edition of the workshop, click here.
Overview
GLB 2022 is the second edition of the Workshop of the Graph Learning Benchmarks, encouraged by the success of GLB 2021. Inspired by the conference tracks in the computer vision and natural language processing communities that are dedicated to establishing new benchmark datasets and tasks, we call for contributions that establish novel ML tasks on novel graph-structured data which have the potential to (i) identifying systematic failure modes of existing GNNs and providing new technical challenges for the development of new models which highlight diverse future directions, (ii) raising the attention of the synergy of graph learning, and (iii) crowdsourcing benchmark datasets for various tasks of graph ML. GLB 2022 will be a virtual and non-archival workshop.
Our previous call for papers can be found here.
Registration
Please register for The Web Conference 2022 to join our workshop. A “Monday & Tuesday only” access (100 €) would be sufficient for joining us, including for the authors who are presenting. We do not require the authors to register with the “Author” pass (250 € / 300 €).
Schedule
All the time listed below are in the time of Lyon, France (Central European Summer Time, UTC+2) in 24-hour clock. The workshop will start at Apr 26, 2022 10:45 CEST.
Time (UTC+2) | Agenda |
---|---|
10:45-10:50 | Opening remarks |
10:50-11:40 | Keynote by Michael Bronstein (50 min): Graph Neural Networks: Trends and Open Problems |
11:40-12:40 | Paper Presentation - Session 1 (60 min) Paper Assignments |
12:40-14:00 | Lunch Break (80 min) |
14:00-14:50 | Keynote by Stephan Günnemann (50 min): Graph Neural Networks for Molecular Systems - Methods and Benchmarks |
14:50-15:30 | Paper Presentation - Session 2 (40 min) Paper Assignments |
15:30-15:45 | Break (15 min) |
15:45-16:35 | Keynote by Tina Eliassi-Rad (50 min): The Why, How, and When of Representations for Complex Systems |
16:35-17:15 | Paper Presentation - Session 3 (40 min) Paper Assignments |
17:15-18:15 | Break (Plenary Talk at TheWebConf) (60 min) |
18:15-19:15 | Panel Discussion (60 min) |
19:15-19:30 | Closing Remarks |
Keynote Speakers
Michael Bronstein
University of Oxford & Twitter
Graph Neural Networks: Trends and Open Problems
Stephan Günnemann
Technical University of Munich
Graph Neural Networks for Molecular Systems - Methods and Benchmarks
Tina Eliassi-Rad
Northeastern University
The Why, How, and When of Representations for Complex Systems
Panelists
Xin Luna Dong
Meta
Petar Veličković
DeepMind & University of Cambridge
Minjie Wang
Amazon
Rose Yu
University of California, San Diego
Accepted Papers
- TeleGraph: A Benchmark Dataset for Hierarchical Link Prediction
Min Zhou (Huawei Technologies co. ltd); Bisheng Li (Fudan University); Menglin Yang (The Chinese University of Hong Kong); Lujia Pan (Huawei Noah’s Ark Lab)AbstractAbstract: Link prediction aims to predict whether two nodes in a network are likely to have a link via the partially available topology or/and node attributes, attracting considerable research efforts owning to its diverse applications. According to the techniques involved, the current link prediction methods can be categorized into three classes: heuristic methods, embedding methods, and GNN-based methods. However, existing link prediction algorithms mainly focus on regular complex networks and are overly dependent on either the closed triangular structure of networks or the so-called preferential attachment phenomenon. The performance of these algorithms on highly sparse or treelike networks has not been well studied. To bridge this gap, we present a new benchmark dataset for link prediction of tree-like networks, aiming to evaluate the performance behavior of link prediction for the tree-like dataset. Our empirical results suggest that most of the algorithms fail on the tree-like dataset except the subgraph-based GNN-models, which calls for special attention when deploying for tree-like datasets in practice.PDF Code & Datasets - Traffic Accident Prediction using Graph Neural Networks: New Datasets and the TRAVEL Model
Baixiang Huang (National University of Singapore); Bryan Hooi (National University of Singapore)AbstractAbstract: Traffic accident prediction is crucial for reducing and mitigating road traffic accidents. Many existing machine learning approaches predict the number of traffic accidents in each cell of a discretized grid without considering the underlying graph structure of road networks. To allow us to incorporate road network information, graph-based approaches such as Graph Neural Networks (GNNs) are a natural choice. However, applying GNNs to the accident prediction problem is made challenging by a lack of suitable graph-structured traffic accident prediction datasets. To overcome this problem, we first construct one thousand real-world graph-based traffic accident datasets, along with two benchmark tasks (accident occurrence prediction and accident severity prediction). We then comprehensively evaluate eleven state-of-the-art GNN variants using the created datasets. Moreover, we propose a novel Traffic Accident Vulnerability Estimation via Linkage (TRAVEL) model, which is designed to capture angular and directional information from road networks. We demonstrate that the TRAVEL model consistently outperforms the GNN baselines. The datasets and code are available at https://github.com/baixianghuang/travel.PDF Code & Datasets - An Open Challenge for Inductive Link Prediction on Knowledge Graphs
Mikhail Galkin (Mila, McGill University); Max Berrendorf (Ludwig-Maximilians-Universität München); Charles T Hoyt (Harvard Medical School)AbstractAbstract: An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility, and is available at \url{https://github.com/pykeen/ilpc2022}.PDF Code & Datasets - What’s Wrong with Deep Learning in Tree Search for Combinatorial Optimization
Maximilian Böther (Hasso Plattner Institute, University of Potsdam); Otto Kißig (Hasso Plattner Institute, University of Potsdam ); Martin Taraz (Hasso Plattner Institut); Sarel Cohen (The Academic College of Tel Aviv-Yaffo); Karen Seidel (Department of Mathematics, University of Potsdam); Tobias Friedrich (Hasso Plattner Institute)AbstractAbstract: Combinatorial optimization lies at the core of many real-world problems. Especially since the rise of graph neural networks (GNNs), the deep learning community has been developing solvers that derive solutions to NP-hard problems by learning the problem-specific solution structure. However, reproducing the results of these publications proves to be difficult. We make three contributions. First, we present an open-source benchmark suite for the NP-hard Maximum Independent Set problem, in both its weighted and unweighted variants. The suite offers a unified interface to various state-of-the-art traditional and machine learning-based solvers. Second, using our benchmark suite, we conduct an in-depth analysis of the popular guided tree search algorithm by Li et al. [NeurIPS 2018], testing various configurations on small and large synthetic and real-world graphs. By re-implementing their algorithm with a focus on code quality and extensibility, we show that the graph convolution network used in the tree search does not learn a meaningful representation of the solution structure, and can in fact be replaced by random values. Instead, the tree search relies on algorithmic techniques like graph kernelization to find good solutions. Thus, the results from the original publication are not reproducible. Third, we extend the analysis to compare the tree search implementations to other solvers, showing that the classical algorithmic solvers often are faster, while providing solutions of similar quality. Additionally, we analyze a recent solver based on reinforcement learning and observe that for this solver, the GNN is responsible for the competitive solution quality.PDF Code - Benchmarking Large-Scale Graph Training Over Effectiveness And Efficiency
Keyu Duan (Rice University); Zirui Liu (Rice University); Wenqing Zheng (University of Texas at Austin); Peihao Wang (University of Texas at Austin); Kaixiong Zhou (Rice University); Tianlong Chen (Unversity of Texas at Austin); Zhangyang Wang (University of Texas at Austin); Xia Hu (Rice University)AbstractAbstract: Large-scale graph learning is a notoriously challenging problem in the community of network analytics and graph neural networks (GNNs). Due to the nature of evolving graph structures (a sparse matrix) into the training process, vanilla message-passing-based GNNs always failed to scale up, limited by training speed and memory occupation. Up to now, many state-of-the-art scalable GNNs have been proposed. However, we still lack a systematic study and fair benchmark of this reservoir to find the rationale for designing scalable GNNs. To this end, we conduct a meticulous and thorough study on large-scale graph learning from the perspective of effectiveness and efficiency. Firstly, we uniformly formulate the representative methods of large-scale graph training and further establish a fair and consistent benchmark regarding effectiveness for them by unifying the hyperparameter configuration. Secondly, benchmarking over efficiency, we theoretically and empirically evaluate the time and space complexity of representative paradigms for large-scale graph training. Best to our knowledge, we are the first to provide a comprehensive investigation of the efficiency of scalable GNNs, which is a key factor for the success of large-scale graph learning. Our code is available at https://github.com/VITA-Group/Large_Scale_GCN_Benchmarking.PDF Code - An Explainable AI Library for Benchmarking Graph Explainers
Chirag Agarwal (Adobe); Owen Queen (University of Tennessee, Knoxville); Himabindu Lakkaraju (Harvard); Marinka Zitnik (Harvard University)AbstractAbstract: With Graph Neural Network (GNN) explainability methods increasingly used to understand GNN predictions in critical real-world applications, it is essential to reliably evaluate the correctness of generated explanations. However, assessing the quality of GNN explanations is challenging as existing evaluation strategies depend on specific datasets with no or unreliable ground-truth explanations and GNN models. Here, we introduce G-XAI Bench, an open-source graph explainability library providing a systematic framework in PyTorch and PyTorch Geometric to compare and evaluate the reliability of GNN explanations. G-XAI Bench provides comprehensive programmatic functionality in the form of data processing functions, GNN model implementations, collections of synthetic and real-world graph datasets, GNN explainers, and performance metrics to benchmark any GNN explainability method. We introduced G-XAI Bench to support the development of novel methods with a strong bent towards developing the foundations of which GNN explainers are most suitable for specific applications and why.PDF - A Heterogeneous Graph Benchmark for Misinformation on Twitter
Dan S Nielsen (University of Bristol); Ryan McConville (University of Bristol)AbstractAbstract: Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.PDF Code & Datasets - A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs
Charles T Hoyt (Harvard Medical School); Max Berrendorf (Ludwig-Maximilians-Universität München); Mikhail Galkin (Mila, McGill University); Volker Tresp (Siemens AG and Ludwig Maximilian University of Munich ); Benjamin Gyori (Harvard Medical School)AbstractAbstract: The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics that address the lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics through which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models. In this case study, the new metrics reveal that on small graphs, most models' results are not significantly different from random, despite appearing convincing when measured with existing metrics.PDF Code - Robust Synthetic GNN Benchmarks with GraphWorld
John Palowitch (Google); Anton Tsitsulin (Google); Brandon Mayer (Google Research); Bryan Perozzi (Google Research)AbstractAbstract: Despite advances in GNNs, only a small number of datasets are used to evaluate new models. This continued reliance on a handful of datasets provides minimal insight into the performance differences between models, and is especially challenging for industrial practitioners who are likely to have datasets which are very different from academic benchmarks. In this work we introduce GraphWorld, a novel methodology for benchmarking GNN models on an arbitrarily-diverse population of synthetic graphs for any GNN task. We present insights from GraphWorld experiments regarding the performance characteristics of eleven GNN models over millions of benchmark datasets. Using GraphWorld, we also are able to study in-detail the relationship between graph properties and task performance metrics, which is nearly impossible with the classic collection of real-world benchmarks.PDF - EXPERT: Public Benchmarks for Dynamic Heterogeneous Academic Graphs
Sameera Horawalavithana (Pacific Northwest National Laboratory); Ellyn Ayton (Pacific Northwest National Laboratory); Anastasiya Usenko (Pacific Northwest National Laboratory); Robin Cosbey (Pacific Northwest National Laboratory); Shivam Sharma (Pacific Northwest National Laboratory); Jasmine Eshun (Pacific Northwest National Laboratory); Maria Glenski (Pacific Northwest National Laboratory); Svitlana Volkova (Pacific Northwest National Laboratory)AbstractAbstract: Machine learning models that learn from dynamic graphs face nontrivial challenges in learning and inference as both nodes and edges change over time. The existing large-scale graph benchmark datasets that are widely used by the community primarily focus on homogeneous node and edge attributes and are static. In this work, we present a variety of large scale, dynamic heterogeneous academic graphs to test the effectiveness of models developed for multi-step graph forecasting tasks. Our novel datasets cover both context and content information extracted from scientific publications across two communities - Artificial Intelligence (AI) and Nuclear Nonproliferation (NN). In addition, we propose a systematic approach to improve the existing evaluation procedures used in the graph forecasting models.PDF Code & Datasets - A Content-First Benchmark for Self-Supervised Graph Representation Learning
Puja Trivedi (University of Michigan); Mark Heimann (Lawrence Livermore); Ekdeep Lubana (University of Michigan); Danai Koutra (U Michigan); Jayaraman Thiagarajan (Lawrence Livermore National Laboratory)AbstractAbstract: Current advances in unsupervised representation learning (URL) have primarily been driven by novel contrastive learning and re-construction based paradigms. Recent work finds the following properties to be critical for visual URL's success: invariance to task-irrelevant attributes, recoverability of labels from augmented samples, and separablility of classes in some latent space. However, these properties are hard to measure or sometimes unsupported when using commonly adopted graph augmentations and benchmarks, making it difficult to evaluate the merits of different URL paradigms or augmentation strategies. For example, on several benchmark datasets, we find that popularly used, generic graph augmentations (GGA) do not induce task-relevant invariance. Moreover, GGA's recoverability cannot be directly evaluated as it is unclear how graph semantics, potentially altered by augmentation, are related to the task. Through this work, we introduce a synthetic data generation process that allows us to control the amount of task-irrelevant (style) and task-relevant (content) information in graph datasets. This construction enables us to define oracle augmentations that induce task-relevant invariances and are recoverable by design. The class separability, i.e., hardness of a task, can also be altered by controlling the degree of irrelevant information. Our proposed process allows us to evaluate how varying levels of style affects the performance of graph URL algorithms and augmentation strategies. Overall, this data generation process is valuable to the community for better understanding limitations of proposed graph URL paradigms that are otherwise not apparent through standard benchmark evaluation.PDF - KGTuner: Efficient Hyper-parameter Search for Knowledge Graph Learning
Yongqi Zhang (4Paradigm Inc.); Zhanke Zhou (Hong Kong Baptist University); Quanming YAO (Tsinghua University); Yong Li (Tsinghua University)AbstractAbstract: While hyper-parameters (HPs) are important for knowledge graph (KG) learning, existing methods fail to search them efficiently. To solve this problem, we first analyze the properties of different HPs and measure the transfer ability from small subgraph to the full graph. Based on the analysis, we propose an efficient two-stage search algorithm KG- Tuner, which efficiently explores HP configurations on small subgraph at the first stage and transfers the top-performed configurations for fine-tuning on the large full graph at the second stage. Experiments show that our method can consistently find better HPs than the baseline algorithms within the same time budget, which achieves 9.1% average relative improvement for four embedding models on the large-scale KGs in open graph benchmark. Our code is released in https://github.com/AutoML-Research/KGTuner.PDF Code
Organizers
- Jiaqi Ma (University of Michigan)
- Jiong Zhu (University of Michigan)
- Anton Tsitsulin (Google Research)
- Marinka Zitnik (Harvard University)
Advisory Board
- Yuxiao Dong (Facebook AI)
- Danai Koutra (University of Michigan)
- Qiaozhu Mei (University of Michigan)
Program Committee
- Aleksandar Bojchevski (CISPA Helmholtz Center for Information Security)
- Alexandru C. Mara (Ghent University)
- Aline Paes (Institute of Computing / Universidade Federal Fluminense)
- Benedek A Rozemberczki (AstraZeneca)
- Christopher Morris (TU Dortmund University)
- Daniel Zügner (Microsoft Research)
- Davide Belli (Qualcomm AI Research)
- Davide Mottin (Aarhus University)
- Derek Lim (MIT)
- Donald Loveland (University of Michigan)
- Jiaxin Ying (University of Michigan)
- Johannes Gasteiger (Technical University of Munich)
- John Palowitch (Google)
- Jun Wang (Ping An Technology (Shenzhen) Co. Ltd.)
- Leonardo F. R. Ribeiro (TU Darmstadt)
- Mark Heimann (Lawrence Livermore National Laboratory)
- Michael T Schaub (RWTH Aachen University)
- Neil Shah (Snap Inc.)
- Oliver Kiss (Central European University)
- Puja Trivedi (University of Michigan)
- Tara L. Safavi (University of Michigan)
- Xiuyu Li (Cornell University)
- Yuan Fang (Singapore Management University)