ROC calculation in figure 4 of paper. #17

fkgruber · 2023-01-05T18:55:48Z

Hi
I would like to understand the ROC calculation of figure 4. Is there code somewhere showing the generation of those plots?

From the paper it appears that the ROC is calculated by determining the number of master regulator to gene edges that where correctly learned. Does this mean that other edges (for example, from between other genes or between genes and master regulators) are just ignored?

Does genie3 assumes that master regulator are sources nodes? or does it learn this as well? What other parameters were set in the genie3 run?

thanks
FKG

PayamDiba · 2023-01-05T22:08:10Z

Master regulators (MRs) are the regulators that have no regulator (i.e., no incoming edges in the GRN). Therefore, regulators in the GRN can be MR (if they are not regulated by any other gene) or non-MR (if they are regulated by other genes). MR profiles (i.e. their production rate) are used in SERGIO to define cell types. For example, one can define 10 different cell-types by defining 10 different profiles for MRs as input.

In our paper, the ROC calculation includes all gene-gene regulations regardless of the regulator identity (i.e. MR or non-MR). But, wherever specified, we have passed the true list of regulators (including both MR and non-MR regulators) to GENIE3 as it is a common input in most of the GRN reconstruction algorithms. Therefore, the ROCs we reported reflect the performance of GRN inference algorithms in reconstructing the "complete" GRN used in simulations.

For ROC calculations you can use existing packages such as scikit-learn. Intuitively, the output of GENIE3 is a sorted list of gene-gene interactions according to a GENIE3 score. By iterating over this sorted list and considering the true labels of interactions (according to the ground truth GRN used in simulations) one can compute the performance metrics such ROC or PRC.

fkgruber · 2023-01-06T18:13:46Z

thanks.

fkgruber · 2023-01-30T18:46:41Z

For ROC calculations you can use existing packages such as scikit-learn. Intuitively, the output of GENIE3 is a sorted list of gene-gene interactions according to a GENIE3 score. By iterating over this sorted list and considering the true labels of interactions (according to the ground truth GRN used in simulations) one can compute the performance metrics such ROC or PRC.

In the calculation of the ROC AUC do you consider all edges or a subset of edges. For the DS3 that you show in the paper there are 1200 genes. That means there are 1200^2-1200 potential edges. Do you use all of them in the calculation of the roc AUC? The reason I ask is that I've seen some inconsistency in different R packages using either all edges (minet) or a subset(bnlearn). the ROC AUC seems especially sensitive to this. I have noticed especially when adding noise that the roc AUC is still quite high if you use all edges (minet R package). However, using a subset of genes (bnlearn) I get random results as you show in paper.

How do you go from the output of GENIE3 to a list of vector of predictions and labels that you can use in AUC calculation.

By the way I don't see this sensitivity with pr AUC and my pr AUC calculation matches what you show on the paper.

thanks
FKG

fkgruber changed the title ~~ROC calculation ing figure 4 of paper.~~ ROC calculation in figure 4 of paper. Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROC calculation in figure 4 of paper. #17

ROC calculation in figure 4 of paper. #17

fkgruber commented Jan 5, 2023 •

edited

Loading

PayamDiba commented Jan 5, 2023

fkgruber commented Jan 6, 2023

fkgruber commented Jan 30, 2023

ROC calculation in figure 4 of paper. #17

ROC calculation in figure 4 of paper. #17

Comments

fkgruber commented Jan 5, 2023 • edited Loading

PayamDiba commented Jan 5, 2023

fkgruber commented Jan 6, 2023

fkgruber commented Jan 30, 2023

fkgruber commented Jan 5, 2023 •

edited

Loading