-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROC calculation in figure 4 of paper. #17
Comments
Master regulators (MRs) are the regulators that have no regulator (i.e., no incoming edges in the GRN). Therefore, regulators in the GRN can be MR (if they are not regulated by any other gene) or non-MR (if they are regulated by other genes). MR profiles (i.e. their production rate) are used in SERGIO to define cell types. For example, one can define 10 different cell-types by defining 10 different profiles for MRs as input. In our paper, the ROC calculation includes all gene-gene regulations regardless of the regulator identity (i.e. MR or non-MR). But, wherever specified, we have passed the true list of regulators (including both MR and non-MR regulators) to GENIE3 as it is a common input in most of the GRN reconstruction algorithms. Therefore, the ROCs we reported reflect the performance of GRN inference algorithms in reconstructing the "complete" GRN used in simulations. For ROC calculations you can use existing packages such as scikit-learn. Intuitively, the output of GENIE3 is a sorted list of gene-gene interactions according to a GENIE3 score. By iterating over this sorted list and considering the true labels of interactions (according to the ground truth GRN used in simulations) one can compute the performance metrics such ROC or PRC. |
thanks. |
In the calculation of the ROC AUC do you consider all edges or a subset of edges. For the DS3 that you show in the paper there are 1200 genes. That means there are 1200^2-1200 potential edges. Do you use all of them in the calculation of the roc AUC? The reason I ask is that I've seen some inconsistency in different R packages using either all edges (minet) or a subset(bnlearn). the ROC AUC seems especially sensitive to this. I have noticed especially when adding noise that the roc AUC is still quite high if you use all edges (minet R package). However, using a subset of genes (bnlearn) I get random results as you show in paper. How do you go from the output of GENIE3 to a list of vector of predictions and labels that you can use in AUC calculation. By the way I don't see this sensitivity with pr AUC and my pr AUC calculation matches what you show on the paper. thanks |
Hi
I would like to understand the ROC calculation of figure 4. Is there code somewhere showing the generation of those plots?
From the paper it appears that the ROC is calculated by determining the number of master regulator to gene edges that where correctly learned. Does this mean that other edges (for example, from between other genes or between genes and master regulators) are just ignored?
Does genie3 assumes that master regulator are sources nodes? or does it learn this as well? What other parameters were set in the genie3 run?
thanks
FKG
The text was updated successfully, but these errors were encountered: