AvGPR is a package that calculates a weighted average Gaussian Process regression model over 5 implementations from packages in both R and Python. It uses cross validation to optimise the weights to select the regression models that are best capturing the data.
You can install the development version of AvGPR from GitHub with:
# install.packages("devtools")
devtools::install_github("benjmuller/AvGPR")
This package requires the python modules ‘numpy’, ‘GPy’, ‘sklearn’, and ‘warnings’ in the python virtual environment. These modules can be installed onto the current reticulate python virtual environment with:
install_python_packages()
Alternatively, the models can be installed on a new python environment with:
create_python_environment()
This is a basic example of a simple GPR using the AvGPR function:
library(AvGPR)
# create_python_environment()
set.seed(1)
n <- 15
X <- runif(n, min = 0, max = 4 * pi)
Y <- 2 * sin(X)
XX <- seq(0, 4 * pi, length.out=100)
AvGPR(data.frame(X=X), data.frame(Y=Y), data.frame(X=XX))
This document is an introduction to Gaussian Process regression (GPR) where it will discuss; how it works, an intuitive way to think about it and show some code for real-world examples.
A Gaussian process is a stochastic process that is used to emulate an event that has normally distributed random variables which are indexed over time or space. Gaussian processes are formed by the joint distribution of these normal random variables. Gaussian Processes can be used as a non-linear multivariate interpolation tool. This process is called Gaussian Process regression or Kriging. Using this method, we can write multi-output prediction problems in the following parametric form:
where
is the simulated output,
is the mean function
and
is zero-mean
Gaussian Process. The mean function is a “best fit” parametric function
of the independent variables. The zero-mean Gaussian Process is
specified by a covariance function. There are many choices of covariance
functions that better suit different types of events being emulated. A
list of some standard functions can be found here
(https://www.cs.toronto.edu/~duvenaud/cookbook/).
A way to understand Gaussian processes, in particular, getting the parametric from the non-parametric form, is by considering a bivariate normal:
A contour plot of a bivariate normal with marginal means 0, marginal variances 2 and covariances 1.
The normal distributions that construct the bivariate normal have
covariance 1, hence the conditioning of an observation (data point)
gives information about what possible values the other random variable
can take. Consider the observation
and
plotting some random samples of possible values for the conditional
distribution
:
Consider expanding this notion to a 10-variate normal distribution where
we fix
with
with correlations that are specified by the squared exponential
covariance function (SE):
Now suppose we condition the observations
,
and
and plot samples of possible values:
The families of curves that join the points are subsets to the family of solutions which are compatible with the imposed values. If we take a large number of these sample curves, we can see that they will appear to give regions of confidence. It is clear that at this stage, this becomes a nonlinear regression problem. So, setting the mean and the covariance function to be parametric functions that define the covariance in the discrete multivariate case. This gives us the construction of the Gaussian Process regression.
In this section we will go through a one dimensional example of Gaussian
process regression. The data is created from a random sample of 8 points
on the sine function in the domain
.
The randomly generated values that will be used in this example are:
The covariance function that this example will be using is the squared exponential function (SE). This has the form
Using the samples and the chosen covariance function, this gives the covariance matrix:
library(plgp)
kernel <- function(x1, x2, sigma = 1, l = 1) {
matrix.dist <- distance(x1,x2)
kern <- (sigma ^ 2) * exp(- matrix.dist / (2 * l ^ 2))
return(kern)
}
Here we will plot a coloured gradient to give a visual representation of the correlations between each of the random variables. This is beneficial when the matrix is large, as there are too many values to reliably interpolate the behaviors.
Using the covariance function to construct covariance matrices from combinations of the observed data and data we want to predict, we can calculate the posterior distribution. The posterior mean vector and covariance matrix are given (respectively) by:
where are the outputs
from the observed data,
is the kernel for the observed data,
is the kernel
for the observed data against the data we want to predict and
is the kernel for the data we want to predict.
posterior <- function(xvals.s, x.train, y.train, sigma = sigma, l = l, sigma.y = 1e-8) {
n <- nrow(x.train); m <- nrow(xvals.s)
k <- kernel(x.train, x.train, sigma = sigma, l = l) + sigma.y ^ 2 * diag(n)
k.s <- kernel(x.train, xvals.s, sigma = sigma, l = l)
k.ss <- kernel(xvals.s, xvals.s, sigma = sigma, l = l) + 1e-8 ^ 2 * diag(m)
k.inv <- solve(k)
y.train <- data.matrix(y.train)
colnames(y.train) <- NULL
mu <- t(k.s) %*% k.inv %*% y.train
cov <- k.ss - t(k.s) %*% k.inv %*% k.s
return(list(mu, cov))
}
Using the predictive distribution, we can plot the Gaussian process by choosing the data we want to predict to be a large sample of points of the domain:
In this example we will be using the Branin function which has the form:
.
We will be using the parameters
,
,
,
,
and
.
The range of values
and
can take are
and
.
This example will use 40 random samples as training data.
Since the covariance matrix has dimensions 40x40, it is difficult to understand the behavior by looking at the correlation coefficients. So we will plot a coloured gradient for the correlations. We will be using the same SE function to calculate the covariance coefficients. Plotting the covariance gradient:
Using the same matrix multiplication as for the one-dimensional case, we can calculate the predictive distribution for this Gaussian process. Using this we can plot the GP:
library(rgl)
plotgp_2D <- function(x1, x2, x.train, y.train, sigma = 1, l = 1,
xlimit = c(), ylimit = c(), zlimit = c()) {
xvals <- expand.grid(x1, x2)
mu <- posterior(xvals, x.train, y.train, sigma = sigma, l = l)[[1]]
zlim <- range(mu)
zlen <- zlim[2] - zlim[1] + 1
colorlut <- hcl.colors(zlen, palette = "Blues")
col <- colorlut[ mu - zlim[1] + 1 ]
persp3d(x=x1, y=x2, z=mu, color=col, zlab = "y(x1, x2)")
}
x1 <- seq(from = -5, to = 10, by = 0.2)
x2 <- seq(from = 0, to = 15, by = 0.2)
plotgp_2D(x1, x2, x.train, y.train, sigma = 1, l = 2)
In this section we have some example functions for higher dimensional Gaussian Process regression.
The borehole function models water flowing through a borehole. The equation has the form:
The OTL Circuit function models an output transformerless push-pull circuit. The equation has the form:
The piston simulation function models the circular motion of a piston within a cylinder. The equation has the form:
The robot arm function models the position of a 4 segment robotic arm. The equation has the form:
AvGPR (Average Gaussian Process Regression) is an R package that uses weighted model averaging to calculate an average Gaussian Process regression model. It does this by emulating 5 different GPR packages then through cross validation, it calculates a “goodness of fit” parameter which is used to find the weight for each model. This is an advantageous method of regression as it gains the benefits associated with model averaging. The measure of fit statistic that is used in this package is:
where denotes the
predicted point,
is the variance of i, and
is the residual. Through analysis and experimentation, this statistic
has shown to provide a better representation of the fit than RSS and
normalised RSS.
Here is an example plot from an AvGPR regression: