diff --git a/GP.qmd b/GP.qmd index ccd80af..15506b1 100644 --- a/GP.qmd +++ b/GP.qmd @@ -8,7 +8,9 @@ title-slide-attributes: data-background-size: contain data-background-opacity: "0.2" format: revealjs -html-math-method: katex +html-math-method: katex +bibliography: references.bib +link-citations: TRUE --- ## Gaussian Process: Introduction @@ -19,7 +21,7 @@ html-math-method: katex - It started being used in the field of spatial statistics, where it is called *kriging*. -- It is also widely used in the field of machine learning since it makes fast predictions and gives good uncertainty quantification commonly used as a **surrogate model**. +- It is also widely used in the field of machine learning since it makes fast predictions and gives good uncertainty quantification commonly used as a **surrogate model**. [@gramacy2020surrogates] ## Uses and Benefits @@ -317,6 +319,8 @@ matplot(X, t(Y_scaled), type = 'l', main = expression(paste(tau^2, " = 25")), ## Length-scale (Rate of decay of correlation) +. . . + - Determines how "wiggly" a function is - Smaller $\theta$ means wigglier functions i.e. visually: @@ -353,6 +357,8 @@ matplot(X, t(Y2), type= 'l', main = expression(paste(theta, " = 5")), ## Nugget (Noise) +. . . + - Ensures discontinuity and prevents interpolation which in turn yields better UQ. - We will compare a sample from g \~ 0 (\< 1e-8 for numeric stability) vs g = 0.1 to observe what actually happens. @@ -438,12 +444,16 @@ lines(XX, mean_gp + 2 * sqrt(s2_gp), col = 4, lty = 2, lwd = 3) ## Extentions +. . . + - **Anisotropic Gaussian Processes**: Suppose our data is multi-dimensional, we can control the **length-scale** ($\theta$) for each dimension. - **Heteroskedastic Gaussian Processes**: Suppose our data is noisy and the noise is input dependent, then we can use a different **nugget** for each unique input rather than a scalar $g$. ## Anisotropic Gaussian Processes +. . . + In this situation, we can rewrite the $C_n$ matrix as, $$C_\theta(x , x') = \exp{ \left( -\sum_{k=1}^{m} \frac{ (x_k - x_k')^2 }{\theta_k} \right ) + g \mathbb{I_n}}$$ @@ -454,9 +464,9 @@ Here, $\theta$ = ($\theta_1$, $\theta_2$, ..., $\theta_m$) is a vector of length . . . -- Heteroskedasticity implies that the data is noisy, and the noise is input dependent and irregular. +- Heteroskedasticity implies that the data is noisy, and the noise is input dependent and irregular. [@binois2018practical] -```{r hetviz, echo = FALSE, cache=F, warning=FALSE, message=FALSE, dev.args = list(bg = 'transparent'), fig.width= 8, fig.height= 5, fig.align="center", warn.conflicts = FALSE} +```{r hetviz, echo = FALSE, cache=F, warning=FALSE, message=FALSE, dev.args = list(bg = 'transparent'), fig.width= 7, fig.height= 4, fig.align="center", warn.conflicts = FALSE} library(plgp) @@ -535,6 +545,8 @@ $$ ## HetGP Setup +. . . + In case of a hetGP, we have: $$ @@ -547,9 +559,10 @@ $$ - Instead of one nugget for the GP, we have a **vector of nuggets** i.e. a unique nugget for each unique input. - ## HetGP Predictions +. . . + - Recall, for a GP, we make predictions using the following: ```{=tex} @@ -669,12 +682,16 @@ lines(xp, mean + 2 * sqrt(s2), col = 4, lty = 2, lwd = 3) ## Intro to Ticks Problem -- EFI-RCN held an ecological forecasting challenge +. . . + +- EFI-RCN held an ecological forecasting challenge [NEON Forecasting Challenge](https://projects.ecoforecast.org/neon4cast-docs/Ticks.html) [@thomas2022neon] - We focus on the Tick Populations theme which studies the abundance of the lone star tick (*Amblyomma americanum*) ## Tick Population Forecasting +. . . + Some details about the challenge: - **Objective**: Forecast tick density for 4 weeks into the future @@ -685,12 +702,19 @@ Some details about the challenge: ## Predictors +. . . + - $X_1$ Iso-week: The week in which the tick density was recorded. - $X_2$ Sine wave: $\left( \text{sin} \ ( \frac{2 \ \pi \ X_1}{106} ) \right)^2$. +- $X_3$ Greenness: Environmental predictor (in practical) + + ## Practical +. . . + - Setup these predictors - Transform the data to normal - Fit a GP to the Data diff --git a/GP_Notes.qmd b/GP_Notes.qmd index 109ef4c..8a31d57 100644 --- a/GP_Notes.qmd +++ b/GP_Notes.qmd @@ -8,6 +8,8 @@ title-slide-attributes: data-background-size: contain data-background-opacity: "0.2" citation: true +bibliography: references.bib +link-citations: TRUE date: 2024-07-21 date-format: long format: @@ -20,7 +22,7 @@ format: # Introduction to Gaussian Processes for Time Dependent Data -This document introduces the conceptual background to Gaussian Process (GP) regression, along with mathematical concepts. We also demonstrate briefly fitting GPs using the `laGP` package in R. The material here is intended to give a more verbose introduction to what is covered in the [lecture](GP.qmd) in order to support a student to work through the [practical component](GP_Practical.qmd). This material has been adapted from chapter 5 of the book [Surrogates: Gaussian process modeling, design and optimization for the applied sciences](https://bobby.gramacy.com/surrogates/) by Robert Gramacy. +This document introduces the conceptual background to Gaussian Process (GP) regression, along with mathematical concepts. We also demonstrate briefly fitting GPs using the `laGP`[@laGP] package in R. The material here is intended to give a more verbose introduction to what is covered in the [lecture](GP.qmd) in order to support a student to work through the [practical component](GP_Practical.qmd). This material has been adapted from chapter 5 of the book [Surrogates: Gaussian process modeling, design and optimization for the applied sciences](https://bobby.gramacy.com/surrogates/) by Robert Gramacy. # Gaussian Processes diff --git a/GP_Practical.qmd b/GP_Practical.qmd index 6450140..5381bf2 100644 --- a/GP_Practical.qmd +++ b/GP_Practical.qmd @@ -12,11 +12,13 @@ format: toc-location: left html-math-method: katex css: styles.css +bibliography: references.bib +link-citations: TRUE --- # Objectives -This practical will lead you through fitting a few versions of GPs using two R packages: `laGP` and `hetGP`. We will begin with a toy example from the lecture and then move on to a real data example to forecast tick abundances for a NEON site. +This practical will lead you through fitting a few versions of GPs using two R packages: `laGP` [@laGP] and `hetGP` [@binois2021hetgp]. We will begin with a toy example from the lecture and then move on to a real data example to forecast tick abundances for a NEON site. # Basics: Fitting a GP Model @@ -130,7 +132,7 @@ Looks pretty cool. # Using GPs for data on tick abundances over time -We will try all this on a simple dataset: Tick Data from NEON Forecasting Challenge. We will first learn a little bit about this dataset, followed by setting up our predictors and using them in our model to predict tick density for the future season. We will also learn how to fit a separable GP and specify priors for our parameters. Finally, we will learn some basics about a HetGP (Heteroskedastic GP) and try and fit that model as well. +We will try all this on a simple dataset: Tick Data from [NEON Forecasting Challenge](https://projects.ecoforecast.org/neon4cast-docs/Ticks.html) We will first learn a little bit about this dataset, followed by setting up our predictors and using them in our model to predict tick density for the future season. We will also learn how to fit a separable GP and specify priors for our parameters. Finally, we will learn some basics about a HetGP (Heteroskedastic GP) and try and fit that model as well. ## Overview of the Data diff --git a/references.bib b/references.bib new file mode 100644 index 0000000..02e6f1f --- /dev/null +++ b/references.bib @@ -0,0 +1,54 @@ +%% This BibTeX bibliography file was created using BibDesk. +%% https://bibdesk.sourceforge.io/ + +%% Created for Parul Vijay Patil + + +%% Saved with string encoding Unicode (UTF-8) + + +@article{laGP, + author = {Gramacy, Robert B.}, + doi = {http://hdl.handle.net/10.}, + journal = {Journal of Statistical Software}, + number = {i01}, + title = {{laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R}}, + volume = {72}, + year = 2016, + bdsk-url-1 = {http://hdl.handle.net/10.}} + +@book{gramacy2020surrogates, + title={Surrogates: Gaussian process modeling, design, and optimization for the applied sciences}, + author={Gramacy, Robert B}, + year={2020}, + publisher={Chapman and Hall/CRC} +} + +@article{binois2021hetgp, + title={hetgp: Heteroskedastic Gaussian process modeling and sequential design in R}, + author={Binois, Micka{\"e}l and Gramacy, Robert B}, + journal={Journal of Statistical Software}, + volume={98}, + pages={1--44}, + year={2021} +} + +@article{binois2018practical, + title={Practical heteroscedastic Gaussian process modeling for large simulation experiments}, + author={Binois, Mickael and Gramacy, Robert B and Ludkovski, Mike}, + journal={Journal of Computational and Graphical Statistics}, + volume={27}, + number={4}, + pages={808--821}, + year={2018}, + publisher={Taylor \& Francis} +} + +@article{thomas2022neon, + title={The NEON ecological forecasting challenge}, + author={Thomas, R Quinn and Boettiger, Carl and Carey, Cayelan C and Dietze, Michael C and Johnson, Leah R and Kenney, Melissa A and Mclachlan, Jason S and Peters, Jody A and Sokol, Eric R and Weltzin, Jake F and others}, + journal={Authorea Preprints}, + year={2022}, + publisher={Authorea} +} +