pulling-together.qmd

# Putting it together: forecast - analysis cycle {#sec-pulling-together}

```{r}
#| echo: false
#| message: false
source("R/forest_model.R")
source("R/helpers.R")
source("R/particle_filter.R")
library(tidyverse)
library(lubridate)
```

## Step 1: Set initial conditions and parameter distributions

We will start our forecast - analysis cycle by running the model with a particle filter over a period of time in the past. This is designed to spin-up the model and set the initial states and parameters for a forecast.

Now we are going to generate a forecast that started from our 1 year spin up completed in [Chapter -@sec-pf2]. First load the analysis data into memory so that we can access the initial conditions and parameters in it.

```{r}
if(file.exists("data/PF_analysis_1.Rdata")){
  load("data/PF_analysis_1.Rdata")
}else{
  load("data/PF_analysis_0.Rdata")
}
ens_members <- dim(analysis$forecast)[2]
```

## Step 2: Get latest data

The latest data is avialable by running [Chapter -@sec-neon] and reading in the csv. You will need to re-run the code in [Chapter -@sec-neon] if you want to update the observations outside the automatic build of this book.

```{r}
obs <- read_csv("data/site_carbon_data.csv", show_col_types = FALSE)
```

## Step 3: Generate forecast

This example introduces the concept of a "look back". This is stepping back in the past to restart the particle filter so that it can assimilate any data that has been make available since the last time that time period was run through the particle filter. The look back is important because many observations have delays before becoming available (called latency). NEON flux data can have a 5 day to 1.5 month latency. MODIS LAI is a 8 day average so has an 8-day latency.

Here we use a look back of 14-days. Since our forecast horizon is 30-days in the future, the total days of the simulation is 14 + 30. Our reference_datetime is today.

We will find the first day of our simulation (14 days ago) in the last saved analysis and use this for initial conditions.

```{r}
site <- "OSBS"
look_back <- 30
horizon <- 30
reference_datetime <- Sys.Date()
sim_dates <- seq(reference_datetime - look_back, length.out = look_back + horizon, by = "1 day")
```

We will fit the time index of the previous forecast that corresponds to the start of our look back + forecast simulation dates

```{r}
index <- which(analysis$sim_dates == sim_dates[1])
```

The use of the loop back requires combining the "historical" weather and future weather into a single input data frame.

```{r}
inputs_past <- get_historical_met(site = site, sim_dates[1:(look_back-2)], use_mean = FALSE)
inputs_future <- get_forecast_met(site = site, sim_dates[(look_back-1):length(sim_dates)], use_mean = FALSE)
inputs <- bind_rows(inputs_past, inputs_future)
inputs_ensemble <- assign_met_ensembles(inputs, ens_members)
```

The combined weather drivers look like the following. The vertical line is where the look back period transitions to the future.

```{r}
#| echo: false
#| fig-cap: NOAA Global Ensemble Forecasting System 31 ensemble members for the two drivers in the forest model.  The look-back and forecast periods are included with the transition denoted with a vertical line.
  ggplot(inputs, aes(x = datetime, y = prediction, group = parameter)) +
  geom_line() +
  geom_vline(aes(xintercept = reference_datetime)) +
  facet_wrap(~variable, scales = "free") +
  theme_bw()
```

Get the constant `params` from the analysis (so that we are sure to use the same parameters that were used in the previous analysis) and initialize the parameters that are being fit by the particle filter with the values from the analysis.

```{r}
params <- analysis$params

num_pars <- 2
fit_params <- array(NA, dim = c(length(sim_dates) ,ens_members , num_pars))
fit_params[1, , 1] <- analysis$fit_params[index, ,1]
fit_params[1, , 2] <- analysis$fit_params[index, ,2]
```

Initialize the states with the states from the analysis that occurred on the index simulation date in the previous run through the particle filter.

```{r}
#Set initial conditions
forecast <- array(NA, dim = c(length(sim_dates), ens_members, 12)) #12 is the number of outputs
forecast[1, , 1] <- analysis$forecast[index, ,1]
forecast[1, , 2] <- analysis$forecast[index, ,2]
forecast[1, , 3] <- analysis$forecast[index, ,3]

wt <- array(1, dim = c(length(sim_dates), ens_members))
wt[1, ] <- analysis$wt[index, ]
```

Run the particle filter (over the look back days) that transitions to a forecast (which is the same as the particle filter without data)

```{r}
for(t in 2:length(sim_dates)){
  
  fit_params[t, , 1] <- rnorm(ens_members, fit_params[t-1, ,1], sd = 0.0005)
  fit_params[t, , 2] <- rnorm(ens_members, fit_params[t-1, ,2], sd = 0.00005)
  
  params$alpha  <- fit_params[t, , 1]
  params$Rbasal  <- fit_params[t, , 2]
  
  if(t > 1){
  
  forecast[t, , ]  <- forest_model(t, 
                                   states = matrix(forecast[t-1 , , 1:3], nrow = ens_members) , 
                                   parms = params, 
                                   inputs = matrix(inputs_ensemble[t , , ], nrow = ens_members))
  }
  
  analysis <- particle_filter(t, forecast, obs, sim_dates, wt, fit_params,
                              variables = c("lai", "wood", "som", "nee"), 
                              sds = c(0.1, 1, 20, 0.005))
  
  forecast <- analysis$forecast
  fit_params <- analysis$fit_params
  wt <- analysis$wt
}
```

## Step 4: Save forecast and data assimilation output

Convert forecast to a dataframe.

```{r}
output_df <- output_to_df(forecast, sim_dates, sim_name = "simple_forest_model")
```

```{r}
forecast_weighted <- array(NA, dim = c(length(sim_dates), ens_members, 12))
params_weighted <- array(NA, dim = c(length(sim_dates) ,ens_members , num_pars))
for(t in 1:length(sim_dates)){
  wt_norm <-  wt[t, ]/sum(wt[t, ])
  resample_index <- sample(1:ens_members, ens_members, replace = TRUE, prob = wt_norm ) 
  forecast_weighted[t, , ] <- forecast[t, resample_index, 1:12] 
  params_weighted[t, , ] <- fit_params[t,resample_index, ] 
}
output_df <- output_to_df(forecast_weighted, sim_dates, sim_name = "simple_forest_model")
parameter_df <- parameters_to_df(params_weighted, sim_dates, sim_name = "simple_forest_model", param_names = c("alpha","Rbasal"))

```

@fig-step4 is a Visualization of the forecast. The vertical line is where the look back period transitions to the future.

```{r}
#| echo: false
#| warning: false
#| fig-cap: Forecast median and 90% CI.  The look-back and forecast periods are included with the transition denoted with a vertical line.
#| label: fig-step4

output_df |> 
  summarise(median = median(prediction),
            upper = quantile(prediction, 0.95, na.rm = TRUE),
            lower = quantile(prediction, 0.05, na.rm = TRUE), 
            .by = c("datetime", "variable")) |> 
  left_join(obs, by = c("datetime", "variable")) |> 
  filter(variable %in% c("nee","wood","som","lai")) |> 
  ggplot(aes(x = datetime)) +
  geom_ribbon(aes(ymin = upper, ymax = lower), alpha = 0.7) +
  geom_line(aes(y = median)) +
  geom_point(aes(y = observation), color = "red") +
  geom_vline(aes(xintercept = reference_datetime)) +
  facet_wrap(~variable, scale = "free") +
  theme_bw()
```

Save the states and weights for use as initial conditions in the next forecast.

```{r}
analysis$params <- params
save(analysis, file = "data/PF_analysis_1.Rdata")
```

Convert to the format and units required by the NEON Ecological Forecasting Challenge

```{r}
efi_output <- output_df |> 
  mutate(datetime = as_datetime(datetime),
         model_id = "bookcast_forest",
         duration = "P1D",
         project_id = "neon4cast",
         site_id = site,
         family = "ensemble",
         reference_datetime = as_datetime(reference_datetime)) |> 
  rename(parameter = ensemble) |> 
  filter(datetime >= reference_datetime) |> 
  filter(variable == "nee") |> 
  mutate(prediction = prediction  / 0.01) #Convert from MgC/ha/Day to gC/m2/day
```

Write the forecast to a csv file

```{r}
file_name <- paste0("data/terrestrial_daily-", reference_datetime, "-bookcast_forest.csv")
write_csv(efi_output, file_name)
```

and submit to the Challenge. The submission uses the submit function in the neon4cast package(`remotes::install_github("eco4cast/neon4cast")`)

```{r}
neon4cast::submit(file_name, ask = FALSE)
```

## Step 5: Repeat Steps 2 - 4

Wait for a day to pass and then use yesterday's analysis today for initial conditions and parameter distributions

```{r}
load("data/PF_analysis_1.Rdata")
```

## Additional example

I recommend looking at a [tutorial](https://github.com/mdietze/FluxCourseForecast) created by Mike Dietze that has many of the particle filter concepts used here. It is a similar process model that runs at the 30-minute time-step. The tutorial is more advanced in its use of priors on the parameters and shows how to forecast across multiple sites.