- We can now pass arguments to
pydgn.training.callback.metric.Metric
objects to create arbitrary behavior
- Added possibility to skip unfinished experiments when calling
pydgn.evaluation.util.retrieve_experiments
. See documentation for more information.
- Replaced torch.save with atomic_save of PyDGN to handle cases where metrics data storage corrupted the file
- Minor fix in
filter_experiments
- Experiment retrieval routines not working as expected when loading checkpoint of a specific configuration rather than the best one.
- Added function
instantiate_data_provider_from_config
that was somehow missing (see tutorials)
- Dropping tests on MacOS 13, as they are causing too many problems since recent changes.
- You can now store metrics trend across epochs using
Plotter
. Just pass the argumentstore_on_disk=True
in the configuration file of the experiment.
- TOML project to comply with latest releases of
macos
andcoverage
package pydgn-train
andpydgn-dataset
not being found in version1.5.4
- Utilities to load model, dataset, data providers, and checkpoints from the experiments folder
- Tutorials on README and documentation on how to use them.
- training loss and score not showing on tensorboard
- Implemented a convenient tqdm progress bar in debug mode to track speed of training and evaluation.
- Created a new splitter class,
SameInnerSplitSplitter
, which allows you to average the validation scores of the same model selection configuration over multiple runs without changing the inner data split. Cannot be combined with a double/nested CV approach, for which you should use the baseSplitter
class to generate different inner data splits. - Trying out a helper mechanism to print to terminal information about the experiment that broke (if any) when you are not in debug mode.
From now on, the default behavior of the training engine is to display training loss/scores computed during the epoch. In the past, at the end of each epoch we always re-evaluated the trained model on the entire training set, but this is often not interesting as early stopping typically acts on the validation set. The behavior can be enabled again by specifying it in the config file, that is:
engine:
- class_name: pydgn.training.engine.TrainingEngine
args:
eval_training: # whether to re-compute epoch loss/scores after training or use those obtained while the model is being trained with mini-batches
- True # re-evaluates on training set after each training epoch, might not change much loss/score values and causes overhead
The default value will be False
from now on to save compute time.
This is the release that adheres to the changes requested by JOSS reviewers.
- The package setup information is now entirely contained in
pyproject.toml
- PyDGN is now system independent, automatically tested on Windows, Linux and macOS systems using GitHub Actions
- Simplified installation instructions
- Removed as many dependencies as possible
- Introduced a PR template
- Removed loading of optimizer's state when evaluating the best checkpoint or the last epoch of a training run, since we only need to perform inference. This might cause problems in corner-case situations (e.g., at the end of a training where one dynamically changes the architecture at every epoch).
- Updated dependencies in toml file
- Removed as many dependencies as possible
- Refactored the toml file of the library to remove legacy files
setup.py
andsetup.cfg
- Utility functions in
pydgn.evaluation.util
to retrieve configuration files in a model selection folder and filter them for post-hoc analyses. - Updated tutorial in documentation accordingly
- Solved some deprecation warnings when using
torch.tensor
to clone an existing tensor in DataProvider
- Commented the single temporal graph learning example to remove the dependency from torch-geometric package. The code to support temporal learning stays.
- Removed the configuration sample for the temporal setting
- PyG requirements is <= 2.3.0, so we can remove all the other dependencies
- Modified the
setup/create_environment.sh
file to use Python 3.9 and PyG 2.3.0
- You can specify weights for loss in
AdditiveLoss
by passing a dictionary of (loss name, loss weight) entries as an argument. See the documentation ofAdditiveLoss
for more info or the example inexamples/MODEL_CONFIGS/config_SupToyDGN.yml
.
- Better handling of
len()
inTUDatasetInterface
- Fixed minor bug when shuffle was set to false that triggered an assertion in training engine
- Package requirements now specify an upper bound on some packages, including Pytorch and PyG to ensure compatibility. Better be safe than sorry :)
[1.3.0] Support for Pytorch 1.13.0, CUDA 11.6, CUDA 11.7, PyG 2.1.0, Ray 2.1.0, support + minor fixes
- Updates to tests to make the fake datasets compatible with PyG 2.1.0
-
IDLE ray workers not deallocating GPUs
-
Now we sort the data list returned by training engine as if samples were not shuffled.
Meaning the returned data list is consistent with the original ordering of the dataset.
We tried to provide support for creating an environment with PyG 2.2.0, but importing the library seems to cause
segmentation fault
in certain cases. Therefore, we will wait until the issue is fixed and then update the script.
-
You can now specify a specific subset of gpus to use in the configuration file.
Just add the optional field
gpus_subset: 1,2,3
if you want to only use GPUs with index 1,2 and 3.
- Ray 2.0.0 seems to have a problem with killing
IDLE
processes and releasing their resources, i.e. OOM on GPU. We are reverting to a version that we were using before and did not have this problem.
- Minor check in splitter
- Minor fix in link prediction splitter, one evaluation link was being left out
- Minor fix in early stopper,
epoch_results
dict was overwritten after applying early stopping. Does not affect performances since the field is re-initialized the subsequent epoch by the training engine. - Removed setting random seed for map-style dataset. It was not useful (see Torch doc on reproducibility) and could cause transforms based on random sampling (e.g. negative sampling) to behave always in the same way
- Changed semantics of gradient clipper, as there are not many alternatives out there
At the moment, the entire graph must fit in CPU/GPU memory. DataProvider
extensions to partition the graph using PyG should not be difficult.
- New splitter,
SingleGraphSplitter
, which randomly splits nodes in a single graph (with optional stratification) - New provider,
SingleGraphDataProvider
, which adds mask fields to the single DataBatch object (representing the graph)
- renamed method
get_graph_targets
ofSplitter
toget_targets
, and modified it to make it more general
- Telegram bot support. Just specify a telegram configuration in a YAML file and let the framework do the rest! Just remember not to push your telegram config file in your repo!
- the changed introduced in splitter causes seed to be resetted when splits were loaded during each experiment. Now it has been fixed by setting the seed only when split() is called.
- minor in num_features of OGBGDatasetInterface
- Simplified metrics to either take the mean score over batches or to compute epoch-wise scores (default behavior). In the former case, the result may be affected by batch size, especially in cases like micro-AP and similar scores. Use it only in case it is too expensive (RAM/GPU memory) to compute the scores in a single shot.
- Bug in splitter, the seed was not set properly and different executions led to different results. This is not a problem whenever the splits are publicly released after the experiments (which is always the case).
- Minor in data loader workers for iterable datasets
- Temporal learning routines (with documentation), works with single graphs sequences
- Template to show how we can use PyDGN on a cluster (see
cluster_slurm_example.sh
) - launch usingsbatch cluster_slurm_example.sh
. Disclaimer: you must have experience with slurm, the script is not working out of the box and settings must be adjusted to your system.
- removed method from
OGBGDatasetInterface
that broke the data split generation phase. - added
**kwargs
to all datasets
- Extended behavior of
TrainingEngine
to allow for null target values and some temporal bookkeeping (allows a lot of code reuse). - Now
batch_loss
andbatch_score
in theState
object are initialized toNone
before training/evaluation of a new batch starts. This could have been a problem in the temporal setting, where we want to accumulate results for different snapshots.
We provide an implementation of iterable-style datasets, where the dataset usually doesn't fit into main memory and
it is stored into different files on disk. If you don't overwrite the __iter__
function, we assume to perform data splitting at
file level, rather than sample level. Each file can in fact contain a list of Data
objects, which will be streamed
sequentially. Variations are possible, depending on your application, but you can use this new dataset class as a good starting point.
If you do, be careful to test it together with the iterable versions of the data provider, engine, and engine callback.
- Implemented an Iterable Dataset inspired by the WebDataset interface
- Similarly, added
DataProvider
,Engine
andEngineCallback
classes for the Iterable-style datasets.
- Now we can pass additional arguments at runtime to the dataset
- (Needs simple test) Setting
CUDA_VISIBLE_DEVICES
variable before cuda is initialized, so that in--debug
mode we can use the GPU with the least amount of used memory. - Commented a couple of lines which forces OMP_NUM_THREADS to 1 and Pytorch threads to 1 as well. It seems we don't need them anymore.
- to comply with
TUDataset
, we do not override the method__len__
anymore
load_dataset
does not assume anymore that there exists aprocessed
data folder, but it is backward-compatible with previous versions.- fixed an indexing bug on target data for node classification experiments (caused program to crash)
- Metric: renamed
_handle_reduction
to_expand_reduction
and created a new helper routine_update_num_samples
to allow a user to decide how to compute and average scores.
- use of fractions of GPUs for a single task
- changed signature in forward to allow a dictionary (for MultiScore) or a value (for basic metrics)
- added squeeze in MulticlassAccuracy when target tensor has shape (?, 1)
- Same bug as before but for
pydgn-dataset
=).
- Bug that prevented locating classes via dotted paths in external projects.
- A documentation (it was about time!!!)
- Possibility of specifying inner and outer validation ratio
- We can now use a specific data loader and specify its arguments in the configuration file
- We can now force a metric to compute node-based or graph-based metrics, rather than looking at the ground truth's shape.
- Possibility of evaluating on validation (and test) every
n
epochs - Use entrypoints to simplify usage of the library
- All arguments must be now specified in the config file. There is a template one can use in the doc.
- Removed any backward compatibilities with very old versions (<=0.6.2)
- Substituted Loss and Score classes with a single Metric class, to avoid code redundancy
- Pre-computed random outer validation splits (extracted from 10% of outer training set) for data splits from "A Fair Comparison on Graph Neural Networks for Graph Classification". Note that this does not impact the test splits.
- The setup installation files now work with Pytorch 1.10 and Pytorch Geometric 2.0.3, and the library assumes Python >= 3.8
- Fixed minor bug in experiment. The function create_unsupervised_model looked for supervised_config, rather than unsupervised_config, when looking for the readout
- Feature request: loss, score, and additiveloss now take a parameter
use_nodes_batch_size
to force computation w.r.t. input nodes rather than target dimension (the default)
- Minor refactoring of the engines to avoid redundant flow of information
- Fixed a bug in EventHandler. If one extends EventHandler with new events, which are triggered by a training engine, make sure that callbacks that implement the EventHandler interface do not break when the new events are triggered.
- Refactored Profiler to abstract from the EventHandler. This created problems when a callback implmenented an interface that extends EventHandler. If the callback does not implement a particular method, nothing happens and the dispatcher moves on.
[0.7.0] - PyDGN temporal (with Alessio Gravina based on Pytorch Geometric Temporal) + minor fixes
- PyDGN Temporal: Support for
single graph sequence
tasks, the most common use case at the moment (tested for supervised experiments only)
- Minor in
cgmm_incremental
experiment - loss/score now considers the case of reduction=mean (default) and sum when computing the epoch's loss/score
- When using checkpoints, we can now switch devices without getting a deserialization error
- Evaluator now stores and displays values for the loss used. Also, validation of final runs is kept.
- Heavy refactoring of the evaluator
- Removed old code in
TUDatasetInterface
- Epochs' log starts from 1 :D
- Code for E-CGMM ("Modeling Edge Features with Deep Bayesian Graph Networks", IJCNN 2021)
- ConstantEdgeIfEmpty transform (used by e.g., E-CGMM)
- Changed name from
transforms
totransform
for data preprocessing config files (backward compatible) - Minor fix when handling edge data with incremental models like E-CGMM
- Fix in graph readout: forgot to pass arguments to super when inheriting from
GraphReadout
-
Data splits from "A fair comparison of graph neural networks for graph classification", ICLR 2021.
-
Replaced strings with macros, to improve maintainability
-
Support to replicability with seeds. Debug CPU mode and parallel CPU mode can reproduce the same results. With CUDA things change due to DataLoader implementation. Even by running the whole exp again, some runs in GPU differ slightly.
-
Added current_set to Score and MultiScore, to keep track of whether the set under consideration is TRAINING, VALIDATION or TEST
-
Random Search support: specify a
num_samples
in the config file with the number of random trials, replacegrid
withrandom
, and specify a sampling method for each hyper-parameter. We provide different sampling methods:- choice --> pick at random from a list of arguments
- uniform --> pick uniformly from min and max arguments
- normal --> sample from normal distribution with mean and std
- randint --> pick at random from min and max
- loguniform --> pick following the recprocal distribution from log_min, log_max, with a specified base
-
Implemented a 2-way training scheme for CGMM and variants, which allows to first compute and store the graph embeddings, and then load them from disk to solve classification tasks. Very fast and lightweight, allowing to easily try 1K configuration for each outer fold.
-
Early stopping can now work with loss functions rather than scores (but a score must be provided nonetheless)
-
Minor improvements in result files
-
Debug mode now prints output to the console
-
Refactored engine for Link Prediction, by subclassing the TrainingEngine class
-
Added chance to mini-batch edges (but not nodes) in single graph link prediction, to reduce the computational burden
-
Compute statistics in iocgmm.py: removed 1 - from bottom computation, because it assigned 1 to nodes with degree 0
-
ProgressManager can load the elapsed time of finished experiments, in case you stop and resume the entire process
-
Fix for semi-supervised graph regression in
training/util.py
(added an.unsqueeze(0)
on they
variable for graph prediction tasks) -
Last config of model selection was not checked in debug mode
-
Moving model to device inside experiments, before the model is passed to the optimizer. This solves optimizer initialization problems e.g., with Adagrad
-
Minor fix in AdditiveLoss
-
Minor fix in Progress Manager due to Ray upgrade in PyDGN 0.4.0
-
Refactored both CGMM and its incremental training strategy
-
Improved evaluator code to define just once remote ray functions
-
Jupyter notebook for backward compatibility with our ICLR 2020 data splits
-
Backward compatibility in Splitter that handles missing validation fields in old splits.
-
Using data root provided through cli rather than the value stored in dataset_kwargs.pt by data preprocessing operations. This is because data location may have changed.
-
Removed load_splitter from utils, which assumed a certain shape of the splits filename. Now we pass the filepath to the data provider.
-
minor fix in AdditiveLoss and MultiScore
-
MultiScore now does not make any assumption on the underlying scores. Plus, it is easier to maintain.
-
CGMM (and variants) mini batch computation (with full batch training) now produces the same loss/scores as full batch computation
-
minor fix in evaluator.py
-
memory leak when not releasing output embeddings from gpu in
engine.py
-
releasing score output from gpu in
score.py
-
Ray transparently replaces multiprocessing. It will help with future extensions to multi-GPU computing
-
Support for parallel executions on potentially different GPUs (with Ray we can now allocate a predefined portion of a GPU for a task)
-
Dataset and Splitter for Open Graph Benchmark graph classification datasets
-
Modified dataset/utils.py removing the need for Path objects
-
Refactored evaluation: risk assessment and model selection logic is now greatly simplified. Code is more robust and maintainable.
-
Print indented config
-
Moved s2c to evaluation/utils.py
-
Improved LaunchExperiment
-
Config files should now specify the complete path to a specific experiment class
-
Renamed files and folders to follow Python convention
-
bug fix when extending the list of final run jobs. We need to add to waiting variable the lastly scheduled jobs only.
-
bug fix in evaluator when using ray
-
Engine now saves (EngineCallback) and restores
stop_training
in checkpoint
-
Implemented a Multi Loss to combine different loss functions while plotting the individual components as well
-
Added standard deviation when multiple final runs are used
-
Removed redundant files and refactored a bit to simplify folder structures
-
Removed utils folder, moved the few methods in the other folders where they were needed
-
Skipping those final runs that are already completed!
-
For each config in k-fold model selection, skipping the experiment of a specific configuration that has produced a result on a specific fold
-
K-Fold Assesser and K-Fold Selector can handle hold-out strategies as well. This simplifies maintenance
-
The wrapper can now be customized by passing a specific engine_callback EventHandler class in the config file
-
No need to specify
--debug
when using a GPU. Experiments that need to run in GPU will automatically trigger sequential execution -
Model selection is now skipped when a single configuration is given
-
Added the type of experiment to the experiments folder name
-
OMP_NUM_THREADS=1
is now dynamically set insideLaunch_Experiment.py
rather by modifying the.bashrc
file -
Simplified installation. README updated
-
Final runs' outputs are now stored in different folders
-
Improved splitter to allow for a simple train val test split with no shuffling. Useful to reuse holdout splits that have been already provided.
-
Improved provider to use outer validation splits in run_test, if provided by the splitter (backward compatibility)
-
Made Scheduler an abstract class. EpochScheduler now uses the epoch to call the step() method
-
Removed ProgressManager refresh timer that caused non-termination of the program
-
Continual experiments:
intermediate_results.csv
andtraining_results.csv
are now deleted when restarting/resuming an experiment. -
Minor fix in engine about the
batch_input
field of aState
variable -
Error when dumping a configuration file that contains a scheduler
-
Error when validation was provided but early stopper was not. removed if that prevented validation and test scores from being computed
A series of improvements and bug fixes. We can now run link prediction experiments on a single graph. Data splits generated are incompatible with those of version 0.1.0.
- Link prediction data splitter (for single graph)
- Link prediction data provider (for single graph)
- Improvements on progress bar manager
- Minor improvements on plotter
- Nothing relevant
The library now creates new processes using the spawn
method. Spawning rather than forking prevents Pytorch from
complaining (see https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork
and https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). Also, there is a warning on a
leaked semaphore which we will ignore for now. Finally, spawn
will be useful to implement CUDA
multiprocessing https://pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-cuda-note. However, the
Pytorch DataLoader in a child process breaks if num_workers > 0
. Waiting for Pytorch to address this issue.
We use PyDGN on a daily basis for our internal projects. In this first release there are some major additions to previous and unreleased version that greatly improve the user experience.
- Progress bars show the status of the experiments (with average completion time) for outer and inner cross validation (
CV).
- Tensorboard visualization is activated by default when using a Plotter.
- A new profiler keeps track of the time spent on each event (see Event engine).
- All experiments can be interrupted at any time and resumed gracefully (the engine looks for the last checkpoint).
- Removed all models that are not necessary to try and test the library.
- Various multiprocessing issues caused by using the
fork
method with Pytorch.
The library now creates new processes using the spawn
method. Spawning rather than forking prevents Pytorch from
complaining (see https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork
and https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). Also, there is a warning on a
leaked semaphore which we will ignore for now. Finally, spawn
will be useful to implement CUDA
multiprocessing https://pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-cuda-note. However, the
Pytorch DataLoader in a child process breaks if num_workers > 0
. Waiting for Pytorch to address this issue.