Skip to content

Commit

Permalink
Added references and bib file
Browse files Browse the repository at this point in the history
  • Loading branch information
parulvijay committed Jul 21, 2024
1 parent d5e959a commit 93ce9ca
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 9 deletions.
36 changes: 30 additions & 6 deletions GP.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ title-slide-attributes:
data-background-size: contain
data-background-opacity: "0.2"
format: revealjs
html-math-method: katex
html-math-method: katex
bibliography: references.bib
link-citations: TRUE
---

## Gaussian Process: Introduction
Expand All @@ -19,7 +21,7 @@ html-math-method: katex

- It started being used in the field of spatial statistics, where it is called *kriging*.

- It is also widely used in the field of machine learning since it makes fast predictions and gives good uncertainty quantification commonly used as a **surrogate model**.
- It is also widely used in the field of machine learning since it makes fast predictions and gives good uncertainty quantification commonly used as a **surrogate model**. [@gramacy2020surrogates]

## Uses and Benefits

Expand Down Expand Up @@ -317,6 +319,8 @@ matplot(X, t(Y_scaled), type = 'l', main = expression(paste(tau^2, " = 25")),

## Length-scale (Rate of decay of correlation)

. . .

- Determines how "wiggly" a function is

- Smaller $\theta$ means wigglier functions i.e. visually:
Expand Down Expand Up @@ -353,6 +357,8 @@ matplot(X, t(Y2), type= 'l', main = expression(paste(theta, " = 5")),

## Nugget (Noise)

. . .

- Ensures discontinuity and prevents interpolation which in turn yields better UQ.

- We will compare a sample from g \~ 0 (\< 1e-8 for numeric stability) vs g = 0.1 to observe what actually happens.
Expand Down Expand Up @@ -438,12 +444,16 @@ lines(XX, mean_gp + 2 * sqrt(s2_gp), col = 4, lty = 2, lwd = 3)

## Extentions

. . .

- **Anisotropic Gaussian Processes**: Suppose our data is multi-dimensional, we can control the **length-scale** ($\theta$) for each dimension.

- **Heteroskedastic Gaussian Processes**: Suppose our data is noisy and the noise is input dependent, then we can use a different **nugget** for each unique input rather than a scalar $g$.

## Anisotropic Gaussian Processes

. . .

In this situation, we can rewrite the $C_n$ matrix as,

$$C_\theta(x , x') = \exp{ \left( -\sum_{k=1}^{m} \frac{ (x_k - x_k')^2 }{\theta_k} \right ) + g \mathbb{I_n}}$$
Expand All @@ -454,9 +464,9 @@ Here, $\theta$ = ($\theta_1$, $\theta_2$, ..., $\theta_m$) is a vector of length

. . .

- Heteroskedasticity implies that the data is noisy, and the noise is input dependent and irregular.
- Heteroskedasticity implies that the data is noisy, and the noise is input dependent and irregular. [@binois2018practical]

```{r hetviz, echo = FALSE, cache=F, warning=FALSE, message=FALSE, dev.args = list(bg = 'transparent'), fig.width= 8, fig.height= 5, fig.align="center", warn.conflicts = FALSE}
```{r hetviz, echo = FALSE, cache=F, warning=FALSE, message=FALSE, dev.args = list(bg = 'transparent'), fig.width= 7, fig.height= 4, fig.align="center", warn.conflicts = FALSE}
library(plgp)
Expand Down Expand Up @@ -535,6 +545,8 @@ $$

## HetGP Setup

. . .

In case of a hetGP, we have:

$$
Expand All @@ -547,9 +559,10 @@ $$

- Instead of one nugget for the GP, we have a **vector of nuggets** i.e. a unique nugget for each unique input.


## HetGP Predictions

. . .

- Recall, for a GP, we make predictions using the following:

```{=tex}
Expand Down Expand Up @@ -669,12 +682,16 @@ lines(xp, mean + 2 * sqrt(s2), col = 4, lty = 2, lwd = 3)

## Intro to Ticks Problem

- EFI-RCN held an ecological forecasting challenge
. . .

- EFI-RCN held an ecological forecasting challenge [NEON Forecasting Challenge](https://projects.ecoforecast.org/neon4cast-docs/Ticks.html) [@thomas2022neon]

- We focus on the Tick Populations theme which studies the abundance of the lone star tick (*Amblyomma americanum*)

## Tick Population Forecasting

. . .

Some details about the challenge:

- **Objective**: Forecast tick density for 4 weeks into the future
Expand All @@ -685,12 +702,19 @@ Some details about the challenge:

## Predictors

. . .

- $X_1$ Iso-week: The week in which the tick density was recorded.

- $X_2$ Sine wave: $\left( \text{sin} \ ( \frac{2 \ \pi \ X_1}{106} ) \right)^2$.

- $X_3$ Greenness: Environmental predictor (in practical)


## Practical

. . .

- Setup these predictors
- Transform the data to normal
- Fit a GP to the Data
Expand Down
4 changes: 3 additions & 1 deletion GP_Notes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ title-slide-attributes:
data-background-size: contain
data-background-opacity: "0.2"
citation: true
bibliography: references.bib
link-citations: TRUE
date: 2024-07-21
date-format: long
format:
Expand All @@ -20,7 +22,7 @@ format:

# Introduction to Gaussian Processes for Time Dependent Data

This document introduces the conceptual background to Gaussian Process (GP) regression, along with mathematical concepts. We also demonstrate briefly fitting GPs using the `laGP` package in R. The material here is intended to give a more verbose introduction to what is covered in the [lecture](GP.qmd) in order to support a student to work through the [practical component](GP_Practical.qmd). This material has been adapted from chapter 5 of the book [Surrogates: Gaussian process modeling, design and optimization for the applied sciences](https://bobby.gramacy.com/surrogates/) by Robert Gramacy.
This document introduces the conceptual background to Gaussian Process (GP) regression, along with mathematical concepts. We also demonstrate briefly fitting GPs using the `laGP`[@laGP] package in R. The material here is intended to give a more verbose introduction to what is covered in the [lecture](GP.qmd) in order to support a student to work through the [practical component](GP_Practical.qmd). This material has been adapted from chapter 5 of the book [Surrogates: Gaussian process modeling, design and optimization for the applied sciences](https://bobby.gramacy.com/surrogates/) by Robert Gramacy.


# Gaussian Processes
Expand Down
6 changes: 4 additions & 2 deletions GP_Practical.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ format:
toc-location: left
html-math-method: katex
css: styles.css
bibliography: references.bib
link-citations: TRUE
---

# Objectives

This practical will lead you through fitting a few versions of GPs using two R packages: `laGP` and `hetGP`. We will begin with a toy example from the lecture and then move on to a real data example to forecast tick abundances for a NEON site.
This practical will lead you through fitting a few versions of GPs using two R packages: `laGP` [@laGP] and `hetGP` [@binois2021hetgp]. We will begin with a toy example from the lecture and then move on to a real data example to forecast tick abundances for a NEON site.

# Basics: Fitting a GP Model

Expand Down Expand Up @@ -130,7 +132,7 @@ Looks pretty cool.

# Using GPs for data on tick abundances over time

We will try all this on a simple dataset: Tick Data from NEON Forecasting Challenge. We will first learn a little bit about this dataset, followed by setting up our predictors and using them in our model to predict tick density for the future season. We will also learn how to fit a separable GP and specify priors for our parameters. Finally, we will learn some basics about a HetGP (Heteroskedastic GP) and try and fit that model as well.
We will try all this on a simple dataset: Tick Data from [NEON Forecasting Challenge](https://projects.ecoforecast.org/neon4cast-docs/Ticks.html) We will first learn a little bit about this dataset, followed by setting up our predictors and using them in our model to predict tick density for the future season. We will also learn how to fit a separable GP and specify priors for our parameters. Finally, we will learn some basics about a HetGP (Heteroskedastic GP) and try and fit that model as well.


## Overview of the Data
Expand Down
54 changes: 54 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
%% This BibTeX bibliography file was created using BibDesk.
%% https://bibdesk.sourceforge.io/
%% Created for Parul Vijay Patil
%% Saved with string encoding Unicode (UTF-8)
@article{laGP,
author = {Gramacy, Robert B.},
doi = {http://hdl.handle.net/10.},
journal = {Journal of Statistical Software},
number = {i01},
title = {{laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R}},
volume = {72},
year = 2016,
bdsk-url-1 = {http://hdl.handle.net/10.}}

@book{gramacy2020surrogates,
title={Surrogates: Gaussian process modeling, design, and optimization for the applied sciences},
author={Gramacy, Robert B},
year={2020},
publisher={Chapman and Hall/CRC}
}

@article{binois2021hetgp,
title={hetgp: Heteroskedastic Gaussian process modeling and sequential design in R},
author={Binois, Micka{\"e}l and Gramacy, Robert B},
journal={Journal of Statistical Software},
volume={98},
pages={1--44},
year={2021}
}

@article{binois2018practical,
title={Practical heteroscedastic Gaussian process modeling for large simulation experiments},
author={Binois, Mickael and Gramacy, Robert B and Ludkovski, Mike},
journal={Journal of Computational and Graphical Statistics},
volume={27},
number={4},
pages={808--821},
year={2018},
publisher={Taylor \& Francis}
}

@article{thomas2022neon,
title={The NEON ecological forecasting challenge},
author={Thomas, R Quinn and Boettiger, Carl and Carey, Cayelan C and Dietze, Michael C and Johnson, Leah R and Kenney, Melissa A and Mclachlan, Jason S and Peters, Jody A and Sokol, Eric R and Weltzin, Jake F and others},
journal={Authorea Preprints},
year={2022},
publisher={Authorea}
}

0 comments on commit 93ce9ca

Please sign in to comment.