Overview: This set of functions was created for Discrete Bayesian Networks using bnlearn package to apply multi-variable prediction and metrics to evaluate your model quality.
What it does: Calculate a multi-variable prediction for discrete bayesian models.
Parameters
bnFit
: a object type bn.fit (created using bnlearn package)trainSet
: dataframe used to train your modeltestSet
: dataframe to evaluate your model i.e. predictionto_predict
: vector with variables that you want to predict with your modelto_evidence
: vector with variables from your model that you willll give as evidence to your model to predictnSamples
: integer related to how many samples that you want to generete in your prediction. Available only if calcFunction parameter is set to cpdist.calcFunction
: string to define the method that will calculate your prediction. The options are predict and cpdist. For more information about those methods acess here for predict and here for cpdist. The NULL option is predict.
PS: If you want to predict any root variable in your model, is strictly recommend to use predict method. Most because cpdist will give the data distribution for that variable.
Values
Return a list of 2 elements:
probList
is a list of probabilities given by your model for every sample for every class from that variable.dominantList
is a list of most probable class for that variable (higher probability) predict by your your model.
What it does: Calculate a set of metrics based on your predictions. See more in Details.
Parameters
reference
: dataframe from your test set that contains the predicted variablesprediction
: dominantList returned from bnMultiVarPredictionpredProbList
: predProbList returned from bnMultiVarPrediction
Values
Return a list of 3 elements:
cmList
is a list of tables for confusion matrix of each predicted variableovaList
is a list of tables for OVA matrix of each predicted variableeval
is a dataframe where the rows are the predicted variables and the columns are the calculated metrics.
- Clone/download this repository to your working directory
- In your code, source the file bnlearn_discrete_multivar_prediction_eval.R as in the example bellow.
library(bnlearn)
source('bnlearn_discrete_multivar_prediction_eval.R')
# Define Train and Test
test <- learning.test[1:1000,]
train <- learning.test[1001:nrow(learning.test),]
# Create and fit model
bnFitted <- bn.fit(x = hc(train), data = train)
# Plot model
plot(hc(train))
# Define Target variables (Variables to be predicted)
pred <- c('B','D')
# Evidence variables (Variables that you will give information to the BN to do the prediction)
evid <- names(train)[!names(train) %in% pred]
# Multi var Prediction
results <- bnMultiVarPrediction(bnFit = bnFitted,
trainSet = train,
testSet = test,
to_predict = pred,
to_evidence = evid,
calcFunction = 'predict')
# Metrics Evaluation
metrics <- bnMetricsMultiVarPrediction(reference = test[pred],
prediction = results$dominantList,
predProbList = results$probList)
The metrics was based in One VS All (OVA) method for the confusion matrix metrics, Scoring Rules and the accuracy based on the multi-level confusion matrix. The OVA metrics final result for a j Variable is the mean of all metrics calculated for each v level.
- Accuracy
- Sensibility
- Specificity
- Precision
- F1-Score
- MCC (Matthews Correlation Coefficient)
- Spherical Payoff
- Brier Loss
- Log Loss
- Accuracy
- For Confusion Matrix Metrics (One VS All) here
- For the Scoring Rules here and here for brier score
For the bnlearn package creator, Marco Scutari, that helped in multi-variable prediction algorithm creation.