This repository contains the code for my project at the Machine Learning Research Unit of the TU Wien Informatics faculty. I am supervised by Univ.Ass. Dipl.-Ing. Fabian Jogl. The goal of the project is to implement the graph transformations, find an implementation of the corresponding higher-order GNNss, an implementation of an MPNN, train and compare the MPNN, the MPNN + graph transformation(s), and the higher-order GNN(s).
Context: It is known that message passing graph neural networks (MPNNs) have limitations in the kind of functions they can express. Graph neural networks (GNNs) that can express strictly more functions than MPNNs are known as higher-order GNNs. It is proven that many higher-order GNNs can be seen as a combination of a weaker MPNNs and a graph transformation. However, there is not enough experimental evidence behind this claim which this project tries to solve.
Most of this work builds upon the following papers:
- Expressivity-Preserving GNN Simulation, NeurIPS, 2023: paper, code
- Expectation-Complete Graph Representations with Homomorphisms, ICML, 2023: paper, code
- Weisfeiler and Leman Return with Graph Transformations, MLG@ECMLPKDD, 2022: paper, code
- Message Passing All The Way Up, GTRL workshop @ ICLR, 2022: paper
- Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, AAAI, 2019, paper
- How Powerful are Graph Neural Networks?, ICLR, 2019, paper
Install dependencies (Python --version = 3.10)
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 -c pytorch
conda install -c pyg pyg=2.2.0
pip install -r requirements.txt
Training and different experiments are tracked via wandb. If you want to make use of the tracking you need a wandb account. The first time you train a model, you will be prompted to enter you wandb API key. If you want to disable tracking you can do this in the config Configs/config.yaml
.
To train a GNN $GNN
once on a datasets $dataset
run
python Exp/run_model.py --model $GNN --dataset $dataset
For example python Exp/run_model.py --model GIN --dataset ZINC
. This trains the GNN GIN on the ZINC dataset a single time. The result of the training will be shown in the terminal. The different hyperparameters of the GNN can be set via commandline parameters. For more details call python Exp/run_model.py -h
.
The script Exp/run_experiment.py
optimizes hyperparameters over a parameter grid and then evaluates the parameters with the best performance on the validation set multiple times. For example:
python Exp/run_experiment.py -grid Configs/Benchmark/GIN_grid.yaml -dataset ogbg-molesol --candidates 20 --repeats 10
This command tries 20 hyperparameter configurations defined in the GIN_grid.yaml
config on the ogbg-molesol
dataset and evaluates the best parameters 10 times. The result of these experiments will be stored in the directory Results/ogbg-molesol_GIN_grid.yaml
, the averages of the best parameters are stored in final.json
. If you have a dataset that requires cross-validation (e.g. CSL
), then you need to set the number of folds (for example --folds 10
).
As Exp/run_model.py
allows to set model hyperparameters from the commandline, we can use WandB sweeps to optimize hyperparameters. Here is a short guide, you need to specify your parameter and scripts to run in a config file (see Configs/WandB_grids/example_grid.yaml
). The sweep can then be initialized with
wandb sweep Configs/WandB_Grids/example_grid.yaml
This command will tell you the command needed to join agents to the sweep. You can even join agents on different computers to the same sweep! Sweeps can also be initialized purely from scripts. More details on sweeps be found here.
Models: GIN, GCN and MLP. MLP pools all vertex features and then passes the resulting vector through an MLP.
Datasets:
ZINC
CSL
: please use cross validation for this dataset- OGB datasets:
ogbg-molhiv
,ogbg-moltox21
,ogbg-molesol
,ogbg-molbace
,ogbg-molclintox
,ogbg-molbbbp
,ogbg-molsider
,ogbg-moltoxcast
,ogbg-mollipo
The integration tests can be executed with
python -m unittest
@inproceedings{neurips-bodnar2021b,
title={Weisfeiler and Lehman Go Cellular: CW Networks},
author={Bodnar, Cristian and Frasca, Fabrizio and Otter, Nina and Wang, Yu Guang and Li{\`o}, Pietro and Mont{\'u}far, Guido and Bronstein, Michael},
booktitle = {Advances in Neural Information Processing Systems},
year={2021}
}
@inproceedings{
xu2018how,
title={How Powerful are Graph Neural Networks?},
author={Keyulu Xu and Weihua Hu and Jure Leskovec and Stefanie Jegelka},
booktitle={International Conference on Learning Representations},
year={2019}
}
@inproceedings{ogb,
author = {Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},
booktitle = {NeurIPS},
title = {{Open Graph Benchmark}: Datasets for Machine Learning on Graphs},
year = {2020}
}
GCN
@inproceedings{GCN,
author = {Thomas N. Kipf and Max Welling},
title = {Semi-Supervised Classification with Graph Convolutional Networks},
year = {2017},
booktitle = {ICLR}
}
GIN
@inproceedings{
xu2018how,
title={How Powerful are Graph Neural Networks?},
author={Keyulu Xu and Weihua Hu and Jure Leskovec and Stefanie Jegelka},
booktitle={ICLR},
year={2019}
}
ZINC
@article{ZINC1,
author = {Gómez-Bombarelli, Rafael and Wei, Jennifer N. and Duvenaud, David and Hernández-Lobato, José Miguel and Sánchez-Lengeling, Benjamín and Sheberla, Dennis and Aguilera-Iparraguirre, Jorge and Hirzel, Timothy D. and Adams, Ryan P. and Aspuru-Guzik, Alán},
title = {Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules},
journal = {ACS Central Science},
year = {2018},
}
@article{ZINC2,
author = {Sterling, Teague and Irwin, John J.},
title = {ZINC 15 – Ligand Discovery for Everyone},
journal = {Journal of Chemical Information and Modeling},
year = {2015},
}
CSL
@inproceedings{relational_pooling,
title = {Relational {Pooling} for {Graph} {Representations}},
author = {Murphy, Ryan L and Srinivasan, Balasubramaniam and Rao, Vinayak and Ribeiro, Bruno},
year = {2019},
booktitle = {ICML}
}
@article{Benchmarking-GNNs,
title={Benchmarking Graph Neural Networks},
author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
journal={arXiv preprint arXiv:2003.00982},
year={2020}
}
OGB
@inproceedings{ogb,
author = {Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},
booktitle = {NeurIPS},
title = {{Open Graph Benchmark}: Datasets for Machine Learning on Graphs},
year = {2020}
}