Skip to content

Configuration Core Concepts

giovanni-stilo edited this page Jan 31, 2024 · 32 revisions

Classes Involved in Dataset generation

Figure 1:A high-level overview of a configuration.

A Configuration is a JSON/JSONC file that defines the main components GRETEL must use, as depicted in the figure above. The components that must be used are declared through the following mandatory sections:

Note that sections reported above must be considered reserved keywords at the root level of the configuration; on the other hand, they can be reordered as per your preference.

Hereafter, a sketch of the configuration is reported for having a broad picture. Note that objects/nested-json-nodes are denoted by json { ... } for readability.

{  
    "experiment" : {
        "scope": "examples_configs",
        "parameters" : {}
    },
    "do-pairs": [
        {"dataset" : { ... }, "oracle": { ... }},
        .
        .
        {"dataset" : { ... }, "oracle": { ... }},
    ],
    "explainers": [
        { ... },
        .
        .
        { ... } 
    ],
    "evaluation_metrics": [ 
        { ... },
        .
        .
        { ... }  
    ],
    "store_paths": [ 
        { ... },
        .
        .
        { ... } 
    ]
}

Note also that the logic of the evaluation_metrics and store_paths sections are still the same as the previous version of GRETEL (1) and, for this reason, they are slightly inconsistent (in terms of naming and initialization mechanism) with the current logic of GRETEL 2.

Configuration of an object

GRETEL v2 configuration mechanism enables the configuration and the instantiation of the Python object directly from the JSON snippet. Each object must have at least the class value and the named parameters that can contain on their own other object as in the following case:

  "class": "src.dataset.dataset_base.Dataset",
            "parameters": {
                "generator": {
                    "class": "src.dataset.generators.treecycles_rand.TreeCyclesRand", 
                    "parameters": { "num_instances": 128, "num_nodes_per_instance": 32, "ratio_nodes_in_cycles": 0.2}
                }
            } 

Note that the class is its full name, including all its packages.

experiment section

The experiment section is needed to declare some global variables as the scope and special mechanisms (as the propagate one) that we will discuss later. The scope will be used to group (in the same folder) the results that come from different runs and configurations. If the results share the same scope, they live in the same directory in the declared output_store_path.

"experiment": {
        "scope": "examples_configs",
        "parameters": {
            "lock_release_tout":120,
            "propagate":[
                {"in_sections" : ["explainers"],"params": {"fold_id": 0}},
                {"in_sections" : ["do-pairs/oracle"],"params": {"fold_id": -1}},
                {"in_sections": ["do-pairs/dataset"],"params": { "compose_man": "config/snippets/datasets/centr_and_weights.json" }}
            ]
        }
    },

The propagate mechanism

TODO

do-pairs section

This section contains the list of dataset-oracle pairs (do-pairs in short). Each do pair must contain two objects: a dataset and oracle. Both the dataset and the oracle must follow the objects' normal configuration structure (see here); as explained before, each object can be deeply configured locally, including its nested objects that are not necessarily implemented in GRETEL as for example of the Torch optimizer. Note that the necessity of entangling the dataset and the oracle is a natural choice since even if the oracle might have similar architecture, it will defer in its weights and potentially hyperparameters from one dataset to another.

"do-pairs": [{
        "dataset": {
            "class": "src.dataset.dataset_base.Dataset",
            "parameters": {
                "generator": {
                    "class": "src.dataset.generators.treecycles_rand.TreeCyclesRand", 
                    "parameters": { "num_instances": 128, "num_nodes_per_instance": 32, "ratio_nodes_in_cycles": 0.2}
                }
            } 
        },
        "oracle": {
            "class": "src.oracle.nn.torch.OracleTorch",
            "parameters": {
                "epochs": 200,
                "batch_size": 32,
                "optimizer": {
                    "class": "torch.optim.RMSprop",
                    "parameters": {
                        "lr":0.01                
                    }
                  },
                "loss_fn": {
                    "class": "torch.nn.CrossEntropyLoss",
                    "parameters": {     
                      "reduction": "mean"
                    }
                  },
                "model": { 
                  "class": "src.oracle.nn.gcn.DownstreamGCN",
                  "parameters": {"num_conv_layers":3,"num_dense_layers":1,"conv_booster":2,"linear_decay":1.8}
              } 
            }   
        }
    }],

explainers section

This section contains the list of explainers that must be evaluated. This list contains the explainer objects that must be instantiated and declared following the objects' normal configuration structure (see here). As previously stated, each object can be deeply configured locally, including its nested objects that are not necessarily implemented in GRETEL, as in the case of the Torch optimizer, for example, for some trainable explainer. It follows for completeness the relative snippet:

    "explainers": [
        {"class": "src.explainer.heuristic.obs.ObliviousBidirectionalSearchExplainer","parameters":{}},
        {"class": "src.explainer.search.i_rand.IRandExplainer", "parameters": {"p": 0.01, "t": 3}},
        {"class": "src.explainer.generative.cf2.CF2Explainer","parameters":{"epochs": 50, "batch_size_ratio": 0.2, "lr" : 0.02, "alpha" : 0.7, "lam" : 20, "gamma" : 0.9}},       
        {"class": "src.explainer.generative.clear.CLEARExplainer","parameters":{ "epochs": 100, "lr": 0.01, "lambda_cfe": 0.1, "alpha": 0.4, "batch_size_ratio": 0.15 }},
        {"class": "src.explainer.generative.rsgg.RSGG",
        "parameters": {"epochs": 150,
            "models": [{"class": "src.explainer.generative.gans.graph.model.GAN","parameters": {"discriminator": {"class": "src.explainer.generative.gans.graph.discriminators.TopKPoolingDiscriminator","parameters": {}}}}]
            }
        }        
    ]

evaluation_metrics section

This section contains the list of the metrics that will be used at evaluation time. It is worth noticing, as previously stated that this part of GRETEL is inherited from V1 and has not been updated yet. For this reason, it is not aligned with the current configuration philosophy, and the keywords, such as the "runtime" one, are fixed.

Here is the full list of the currently implemented metrics as reported in the snippet available at config/snippets/default_metrics.json:

 {"evaluation_metrics": [ 
    {"name": "runtime", "parameters": {}},
    {"name": "graph_edit_distance", "parameters": {}},
    {"name": "oracle_calls", "parameters": {}},
    {"name": "correctness", "parameters": {}},
    {"name": "sparsity", "parameters": {}},
    {"name": "fidelity", "parameters": {}},
    {"name": "oracle_accuracy", "parameters": {}}
 ]}

store_paths section

This section contains the declarations of the paths that will be used by the framework. The declared names, such as dataset_store_path, are fixed, but their values can be customized. Note that those paths are mainly the store paths the cache mechanism uses since each imported dataset can declare its own source folder. It is worth noticing that log_store_path contains the logs produced at the runtime (stdout), and the output_store_path store the results of the evaluation (it might be important to take notes of this folder for the analysts of the results). It follows the default snippet also available config/snippets/default_store_paths.json:

 "store_paths": [
        {"name": "dataset_store_path", "address": "./data/cache/datasets/"},        
        {"name": "oracle_store_path", "address": "./data/cache/oracles/"},
        {"name": "embedder_store_path", "address": "./data/cache/oracles/"},
        {"name": "explainer_store_path", "address": "./data/cache/explainers/"},        
        {"name": "log_store_path", "address": "./output/logs/"},
        {"name": "output_store_path", "address": "./output/results/"}
    ]

Relationships among Sections

As is possible to notice, almost all the sections define a list of objects. Thus, hereafter, we discuss the fundamental relationships that exist among them. The two main ones are the relationships that exist between do-pairs and the explainers and between the do-pairs-explainers triplets and the metrics. Let's start discussing the relationships by formally defining the involved objects:

Let be $DO$, the set of the do-pairs, and let be $E$, the set of the explainers, while $M$ will be the set of evaluation metrics.

Then, the set of triplets $T$ as the cartesian product between $DO \times E$ is created in memory.

Finally, the evaluation will be performed accordingly to the following schema $\forall_{t \in T} \forall_{m \in M}: evaluate(t,m);$.

Note that this built-in feature allows you to perform all the needed experiments in a compact way without losing the possibility of detailed controls. Moreover, splitting the configuration into two different ones is suggested in case of incompatibility among some do-pair and certain explainers (e.g., specialized datasets/domains/explainers). Keep in mind that if the scope is the same, all the results will be available in the same folder. Look also to the compose mechanism for a more robust and compact organization of configurations.

Compose mechanism

This mechanism allows to replace(include) the declared configuration snippet in place of the compose object. Any compose object must start with the prefix compose_ and specify the path where the configuration snippet can be found, in the following example:

"compose_man": "config/snippets/datasets/centr_and_weights.json"

The replaced object must be a valid JSON object as follows:

{
    "manipulators": [
    { "class": "src.dataset.manipulators.centralities.NodeCentrality", "parameters": {} },
    { "class": "src.dataset.manipulators.weights.EdgeWeights", "parameters": {} }       
    ]    
}

Note that it will be in the user's charge not to declare two compose_ objects in the same level of the JSON with the same identifier.

The compose and the propagate are the two most powerful helper mechanisms that can be leveraged properly to reuse and configure your experiment. It is allowed to include one of the two in the other. E.g., you can have a compose that includes the propagate object, and/or you can have a propagate that uses one (or more) compose in its properties. Lastly, it must be noted that it is also possible to use the compose mechanism in a nested way: one snippet file can compose other snippets, and so on.

All in One Example

Hereafter, we provide as an example a full configuration within its counterpart obtained by using both the propagate and compose mechanisms. We leave it to the audience to take a look at the included files (that can be reused in other configurations):

Full version

{  
    "experiment": {
        "scope": "examples_configs",
        "parameters": {"lock_release_tout":120}
    },
    "do-pairs":[ {
        "dataset": {
            "class": "src.dataset.dataset_base.Dataset",
            "parameters": {
                "generator": {
                    "class": "src.dataset.generators.bbbp.BBBP",
                    "parameters": {
                        "data_dir": "data/datasets/bbbp/",
                        "data_file_name": "BBBP.csv",
                        "data_label_name": "p_np"
                    }
                },
                "manipulators": [
                    { "class": "src.n_dataset.manipulators.centralities.NodeCentrality", "parameters": {} },
                    { "class": "src.n_dataset.manipulators.weights.EdgeWeights", "parameters": {} }       
                ] 
            }
        },
        "oracle": {
            "class": "src.oracle.tabulars.svm.SVMOracle",           
            "parameters": {
                "fold_id": -1,
                "embedder": {
                    "class": "src.embedder.molecule.model.RDKFingerprintEmbedder", 
                    "parameters": {}
                },
                "model": {  "parameters": {} }            
            } 
        }   
    } ],
    "explainers": [{"class": "src.explainer.search.dces.DCESExplainer", "parameters": {"fold_id": 0}}],
    "evaluation_metrics": [ 
        {"name": "graph_edit_distance", "parameters": {}},
        {"name": "oracle_calls", "parameters": {}},
        {"name": "correctness", "parameters": {}},
        {"name": "sparsity", "parameters": {}},
        {"name": "fidelity", "parameters": {}},
        {"name": "oracle_accuracy", "parameters": {}}
    ],
    "store_paths": [
        {"name": "n_dataset_store_path", "address": "./data/cache/datasets/"},        
        {"name": "oracle_store_path", "address": "./data/cache/oracles/"},
        {"name": "embedder_store_path", "address": "./data/cache/oracles/"},
        {"name": "explainer_store_path", "address": "./data/cache/explainers/"},        
        {"name": "log_store_path", "address": "./output/logs/"},
        {"name": "output_store_path", "address": "./output/"}
    ]
}

The version using compose/propagate

{  
    "experiment": {
        "scope": "examples_configs",
        "parameters": {
            "lock_release_tout":120,
            "propagate":[
                {"in_sections" : ["explainers"], "params": {"fold_id": 0}},
                {"in_sections" : ["do-pairs/oracle"], "params": {"fold_id": -1}},
                {"in_sections": ["do-pairs/dataset"], "params": { "compose_man": "config/snippets/datasets/centr_and_weights.json" }}
            ]
        }
    },

    "do-pairs":[ {"compose_bbbp_svm" : "config/snippets/do-pairs/BBBP_SVM-MOL.json"} ],
    "explainers": [{"class": "src.explainer.search.dces.DCESExplainer"}],
    "compose_mes" : "config/snippets/default_metrics.json",
    "compose_strs" : "config/snippets/default_store_paths.json"
}