forked from frec-5174/eco4cast-in-R-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpulling-together.qmd
225 lines (171 loc) · 8.74 KB
/
pulling-together.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# Putting it together: forecast - analysis cycle {#sec-pulling-together}
```{r}
#| echo: false
#| message: false
source("R/forest_model.R")
source("R/helpers.R")
source("R/particle_filter.R")
library(tidyverse)
library(lubridate)
```
## Step 1: Set initial conditions and parameter distributions
We will start our forecast - analysis cycle by running the model with a particle filter over a period of time in the past. This is designed to spin-up the model and set the initial states and parameters for a forecast.
Now we are going to generate a forecast that started from our 1 year spin up completed in [Chapter -@sec-pf2]. First load the analysis data into memory so that we can access the initial conditions and parameters in it.
```{r}
if(file.exists("data/PF_analysis_1.Rdata")){
load("data/PF_analysis_1.Rdata")
}else{
load("data/PF_analysis_0.Rdata")
}
ens_members <- dim(analysis$forecast)[2]
```
## Step 2: Get latest data
The latest data is avialable by running [Chapter -@sec-neon] and reading in the csv. You will need to re-run the code in [Chapter -@sec-neon] if you want to update the observations outside the automatic build of this book.
```{r}
obs <- read_csv("data/site_carbon_data.csv", show_col_types = FALSE)
```
## Step 3: Generate forecast
This example introduces the concept of a "look back". This is stepping back in the past to restart the particle filter so that it can assimilate any data that has been make available since the last time that time period was run through the particle filter. The look back is important because many observations have delays before becoming available (called latency). NEON flux data can have a 5 day to 1.5 month latency. MODIS LAI is a 8 day average so has an 8-day latency.
Here we use a look back of 14-days. Since our forecast horizon is 30-days in the future, the total days of the simulation is 14 + 30. Our reference_datetime is today.
We will find the first day of our simulation (14 days ago) in the last saved analysis and use this for initial conditions.
```{r}
site <- "OSBS"
look_back <- 30
horizon <- 30
reference_datetime <- Sys.Date()
sim_dates <- seq(reference_datetime - look_back, length.out = look_back + horizon, by = "1 day")
```
We will fit the time index of the previous forecast that corresponds to the start of our look back + forecast simulation dates
```{r}
index <- which(analysis$sim_dates == sim_dates[1])
```
The use of the loop back requires combining the "historical" weather and future weather into a single input data frame.
```{r}
inputs_past <- get_historical_met(site = site, sim_dates[1:(look_back-2)], use_mean = FALSE)
inputs_future <- get_forecast_met(site = site, sim_dates[(look_back-1):length(sim_dates)], use_mean = FALSE)
inputs <- bind_rows(inputs_past, inputs_future)
inputs_ensemble <- assign_met_ensembles(inputs, ens_members)
```
The combined weather drivers look like the following. The vertical line is where the look back period transitions to the future.
```{r}
#| echo: false
#| fig-cap: NOAA Global Ensemble Forecasting System 31 ensemble members for the two drivers in the forest model. The look-back and forecast periods are included with the transition denoted with a vertical line.
ggplot(inputs, aes(x = datetime, y = prediction, group = parameter)) +
geom_line() +
geom_vline(aes(xintercept = reference_datetime)) +
facet_wrap(~variable, scales = "free") +
theme_bw()
```
Get the constant `params` from the analysis (so that we are sure to use the same parameters that were used in the previous analysis) and initialize the parameters that are being fit by the particle filter with the values from the analysis.
```{r}
params <- analysis$params
num_pars <- 2
fit_params <- array(NA, dim = c(length(sim_dates) ,ens_members , num_pars))
fit_params[1, , 1] <- analysis$fit_params[index, ,1]
fit_params[1, , 2] <- analysis$fit_params[index, ,2]
```
Initialize the states with the states from the analysis that occurred on the index simulation date in the previous run through the particle filter.
```{r}
#Set initial conditions
forecast <- array(NA, dim = c(length(sim_dates), ens_members, 12)) #12 is the number of outputs
forecast[1, , 1] <- analysis$forecast[index, ,1]
forecast[1, , 2] <- analysis$forecast[index, ,2]
forecast[1, , 3] <- analysis$forecast[index, ,3]
wt <- array(1, dim = c(length(sim_dates), ens_members))
wt[1, ] <- analysis$wt[index, ]
```
Run the particle filter (over the look back days) that transitions to a forecast (which is the same as the particle filter without data)
```{r}
for(t in 2:length(sim_dates)){
fit_params[t, , 1] <- rnorm(ens_members, fit_params[t-1, ,1], sd = 0.0005)
fit_params[t, , 2] <- rnorm(ens_members, fit_params[t-1, ,2], sd = 0.00005)
params$alpha <- fit_params[t, , 1]
params$Rbasal <- fit_params[t, , 2]
if(t > 1){
forecast[t, , ] <- forest_model(t,
states = matrix(forecast[t-1 , , 1:3], nrow = ens_members) ,
parms = params,
inputs = matrix(inputs_ensemble[t , , ], nrow = ens_members))
}
analysis <- particle_filter(t, forecast, obs, sim_dates, wt, fit_params,
variables = c("lai", "wood", "som", "nee"),
sds = c(0.1, 1, 20, 0.005))
forecast <- analysis$forecast
fit_params <- analysis$fit_params
wt <- analysis$wt
}
```
## Step 4: Save forecast and data assimilation output
Convert forecast to a dataframe.
```{r}
output_df <- output_to_df(forecast, sim_dates, sim_name = "simple_forest_model")
```
```{r}
forecast_weighted <- array(NA, dim = c(length(sim_dates), ens_members, 12))
params_weighted <- array(NA, dim = c(length(sim_dates) ,ens_members , num_pars))
for(t in 1:length(sim_dates)){
wt_norm <- wt[t, ]/sum(wt[t, ])
resample_index <- sample(1:ens_members, ens_members, replace = TRUE, prob = wt_norm )
forecast_weighted[t, , ] <- forecast[t, resample_index, 1:12]
params_weighted[t, , ] <- fit_params[t,resample_index, ]
}
output_df <- output_to_df(forecast_weighted, sim_dates, sim_name = "simple_forest_model")
parameter_df <- parameters_to_df(params_weighted, sim_dates, sim_name = "simple_forest_model", param_names = c("alpha","Rbasal"))
```
@fig-step4 is a Visualization of the forecast. The vertical line is where the look back period transitions to the future.
```{r}
#| echo: false
#| warning: false
#| fig-cap: Forecast median and 90% CI. The look-back and forecast periods are included with the transition denoted with a vertical line.
#| label: fig-step4
output_df |>
summarise(median = median(prediction),
upper = quantile(prediction, 0.95, na.rm = TRUE),
lower = quantile(prediction, 0.05, na.rm = TRUE),
.by = c("datetime", "variable")) |>
left_join(obs, by = c("datetime", "variable")) |>
filter(variable %in% c("nee","wood","som","lai")) |>
ggplot(aes(x = datetime)) +
geom_ribbon(aes(ymin = upper, ymax = lower), alpha = 0.7) +
geom_line(aes(y = median)) +
geom_point(aes(y = observation), color = "red") +
geom_vline(aes(xintercept = reference_datetime)) +
facet_wrap(~variable, scale = "free") +
theme_bw()
```
Save the states and weights for use as initial conditions in the next forecast.
```{r}
analysis$params <- params
save(analysis, file = "data/PF_analysis_1.Rdata")
```
Convert to the format and units required by the NEON Ecological Forecasting Challenge
```{r}
efi_output <- output_df |>
mutate(datetime = as_datetime(datetime),
model_id = "bookcast_forest",
duration = "P1D",
project_id = "neon4cast",
site_id = site,
family = "ensemble",
reference_datetime = as_datetime(reference_datetime)) |>
rename(parameter = ensemble) |>
filter(datetime >= reference_datetime) |>
filter(variable == "nee") |>
mutate(prediction = prediction / 0.01) #Convert from MgC/ha/Day to gC/m2/day
```
Write the forecast to a csv file
```{r}
file_name <- paste0("data/terrestrial_daily-", reference_datetime, "-bookcast_forest.csv")
write_csv(efi_output, file_name)
```
and submit to the Challenge. The submission uses the submit function in the neon4cast package(`remotes::install_github("eco4cast/neon4cast")`)
```{r}
neon4cast::submit(file_name, ask = FALSE)
```
## Step 5: Repeat Steps 2 - 4
Wait for a day to pass and then use yesterday's analysis today for initial conditions and parameter distributions
```{r}
load("data/PF_analysis_1.Rdata")
```
## Additional example
I recommend looking at a [tutorial](https://github.com/mdietze/FluxCourseForecast) created by Mike Dietze that has many of the particle filter concepts used here. It is a similar process model that runs at the 30-minute time-step. The tutorial is more advanced in its use of priors on the parameters and shows how to forecast across multiple sites.