-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathForecasting R codes.Rmd
3671 lines (2386 loc) · 157 KB
/
Forecasting R codes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Forecasting Final Project (MATH1307)"
author: "Rakshit Chandna s3956924"
date: "2023-10-22"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Table of Contents
- **Task 1**
- **Necessary Libraries**
- **Introduction**
- **Data Description**
- **Data Prepossessing**
- **Data Exploration and Visualization**
- **ACF and PACF**
- **Tests for Stationarity**
- **ADF Test **
- **Decomposition**
- **X12 decomposition**
- **STL decomposition**
- **Model Fitting**
- **Distributed Lag Models**
- **Dynamic Linear Models**
- **Exponential Smoothing Method**
- **State-space Models**
- **Best Model Selection**
- **Forecasting**
- **Conclusion**
- **Task 2**
- **Introduction**
- **Data Description**
- **Objective & Methodology**
- **Data Exploration and Visualization**
- **Tests for Stationarity**
- **ADF Test **
- **Model Fitting**
- **Distributed Lag Models**
- **Dynamic Linear Models**
- **Exponential Smoothing Method**
- **State-space Models**
- **Best Model Selection**
- **Forecasting**
- **Conclusion**
- **Task 3 Part(a)**
- **Data Description**
- **Objective & Methodology**
- **Data Exploration and Visualization**
- **Tests for Stationarity**
- **ADF Test **
- **Model Fitting**
- **Distributed Lag Models**
- **Dynamic Linear Models**
- **Exponential Smoothing Method**
- **State-space Models**
- **Best Model Selection**
- **Forecasting**
- **Conclusion**
- **Task 3 Part(b)**
- **Data Exploration and Visualization**
- **Tests for Stationarity**
- **ADF Test **
- **Model Fitting**
- **Distributed Lag Models**
- **Dynamic Linear Models**
- **Exponential Smoothing Method**
- **State-space Models**
- **Best Model Selection**
- **Forecasting**
- **Conclusion**
- **References**
# Necessary Libraries
```{r message=FALSE, warning=FALSE}
library(TSA)
library(car)
library(carData)
library(lmtest)
library(dplyr)
library(AER)
library(dynlm)
library(corrr)
library(Hmisc)
library(forecast)
library(dplyr)
library(xts)
library(x12)
library(ggplot2)
library(x13binary)
library(nardl)
library(dLagM)
library(readr)
library(tseries)
library(urca)
library(expsmooth)
```
# Introduction
The report for the assignment is segmented into three distinct tasks, each focusing on the analysis of specific time series data. In the initial task, we scrutinize five weekly series from 2010-2020, which include mortality, temperature, particle size of pollutants, and two chemical emissions named chem1 and chem2. The objective is to forecast mortality four weeks into the future. The subsequent task revolves around predicting FFD for the next four years by leveraging various climate indicators. The third task is bifurcated into sections (a) and (b). The former involves a univariate analysis of RBO, employing individual climate predictors to project its values for the coming three years. Meanwhile, section (b) necessitates another three-year forecast for RBO, this time factoring in the effects of the Australian drought that spanned from 1996 to 2009.
# Task-1 Time series analysis & forecast of mortality series
# Data Description
From 2010 to 2020, a group of scientists studied the average weekly mortality related to specific diseases in Paris, France. They also examined the city's ambient temperature (measured in degrees Fahrenheit), the size of pollutant particles, and the concentration of harmful chemicals released from vehicles and industrial processes. The 'mort' dataset comprises weekly records for these five categories: mortality rate, temperature, size of pollutant particles, and emissions of two distinct chemicals (chem1 and chem2), spanning over 508 instances during this decade.
## Importing the Dataset "Mort.csv"
```{r message=FALSE, warning=FALSE}
mort_original <- read_csv("C:/Users/Rakshit Chandna/OneDrive/Desktop/DataMain/Forecasting/mort.csv")
head(mort_original)
```
```{r}
class(mort_original)
```
From the above output, we can see that the original dataset variables is either in the form of table or in the data-frame. So we will convert it into Time series.
# Converting data into Time series
```{r}
mort <- ts(mort_original[,2:6], start = c(2010,7), frequency = 52)
head(mort)
```
### Converting each variable of data set into Time series on weekly basis from the year 2010
```{r}
Mort <- ts(mort_original$mortality,start = c(2010,1),frequency = 52)
Temp <- ts(mort_original$temp,start = c(2010,1),frequency = 52)
Chem1 <- ts(mort_original$chem1,start = c(2010,1),frequency = 52)
Chem2 <- ts(mort_original$chem2,start = c(2010,1),frequency = 52)
Size <- ts(mort_original$`particle size`,start = c(2010,1),frequency = 52)
```
# Checking Class
```{r}
cbind(Variable= c("Mortality","Temp","Chem1","Chem2","Particle size"), Class = c(class(Mort), class(Temp),class(Chem1),class(Chem2),class(Size)))
```
As we can now observe that we have successfully converted all the variable in the data into Time series.We will now explore and visualize the data.
# Data Exploration and Visualisation
We will now plot the converted Time Series Data for the Mortality time series variables and interpret their important characteristics using the **5 Bullet Points**:
- **Trend**
- **Seasonality**
- **Changing Variance**
- **Behavior**
- **Intervention Point**
## Plotting the time series graph of Mortality variable
```{r, fig.cap= "Figure 1"}
plot(Mort,ylab="Weekely Mortality",xlab="Years",main="Figure-1: Time series plot of Mortality series",type='o',col="red2")
```
From the above plot(Figure 1), the Time Series plot shows high level of **Seasonality** with successive Auto regressive points and moving average overall from the year 2010 to 2020 respectively.
Checking **5 bullet points**:
- **Trend** - There is nearly slight Downward Trend present in the series.
- **Seasonality** - There is very high level of repeating patterns (seasonality) can be seen in the graph from year 2010-2020
- **Changing Variance** - There is a significant level of changing variance which is also in repeating patterns, can be seen if we look at the years 2012-2014 and 2018-2020.
- **Behavior** - A mixed Auto regressive and Moving average behavior can be seen in the graph.
- **Intervention point** - There is a sudden change point (peaks/drops) in the year 2013 observed in the series.
## Plotting the time series graph of Temperature variable
```{r, fig.cap= "Figure 2"}
plot(Temp,ylab="Weekely Temperature",xlab="Years",main="Figure-2: Time series plot of Temperature series",type='o',col="darkmagenta")
```
From the above plot(Figure 2), the Time Series plot shows high level of **Seasonality** with successive Auto regressive points and moving average overall from the year 2010 to 2020 respectively.
Checking **5 bullet points**:
- **Trend** - There is nearly slight Upward Trend present in the series.
- **Seasonality** - There is very high level of repeating patterns (seasonality) can be seen in the graph from year 2010-2020
- **Changing Variance** - There is not much changing variance present in the series .
- **Behavior** - A mixed Auto regressive and Moving average behavior can be seen in the graph.
- **Intervention point** - There is not sudden change point observed in the series.
## Plotting the time series graph of Chemical-1 variable
```{r, fig.cap= "Figure 3"}
plot(Chem1,ylab="Weekely Chemical1",xlab="Years",main="Figure-3: Time series plot of Chemical-1 series",type='o',col="orange")
```
From the above plot(Figure 3), the Time Series plot shows high level of **Seasonality** with successive Auto regressive points and moving average overall from the year 2010 to 2020 respectively.
Checking **5 bullet points**:
- **Trend** - There is a slight Downward Trend present in the series overall.
- **Seasonality** - There is very high level of repeating patterns (seasonality) can be seen in the graph from year 2010-2020
- **Changing Variance** - There is a significant level of changing variance which is also in repeating patterns, can be seen if we look at the years 2010-2011 and 2018-2019.
- **Behavior** - A mixed Auto regressive and Moving average behavior can be seen in the graph.
- **Intervention point** - There is a sudden change point (peaks) in the year 2011 observed in the series.
## Plotting the time series graph of Chemical-2 variable
```{r, fig.cap= "Figure 4"}
plot(Chem2,ylab="Weekely Chemical2",xlab="Years",main="Figure-4: Time series plot of Chemical-2 series",type='o',col="blue")
```
From the above plot(Figure 4), the Time Series plot shows high level of **Seasonality** with successive Auto regressive points and moving average overall from the year 2010 to 2020 respectively.
Checking **5 bullet points**:
- **Trend** - There is nearly No Trend present in the series.
- **Seasonality** - There is very high level of repeating patterns (seasonality) can be seen in the graph from year 2010-2020
- **Changing Variance** - There is a changing variance present in the series if looked the year 2011 and 2019 .
- **Behavior** - A mixed Auto regressive and Moving average behavior can be seen in the graph.
- **Intervention point** - There is not sudden change point observed in the series.
## Plotting the time series graph of Particle Size variable
```{r, fig.cap= "Figure 5"}
plot(Size,ylab="Weekely Particle Size",xlab="Years",main="Figure-5: Time series plot of Particle Size series",type='o',col="darkgreen")
```
From the above plot(Figure 5), the Time Series plot shows high level of **Seasonality** with successive Auto regressive points and moving average overall from the year 2010 to 2020 respectively.
Checking **5 bullet points**:
- **Trend** - There is nearly slight Upward Trend present in the series overall.
- **Seasonality** - There is very high level of repeating patterns (seasonality) can be seen in the graph from year 2010-2020
- **Changing Variance** - There is a changing variance present in the series .
- **Behavior** - A mixed Auto regressive and Moving average behavior can be seen in the graph.
- **Intervention point** - There is a sudden change point (peak) observed in the series at year 2019.
We will now further explore the relationship between all of the variables by plotting their time series within the same plot. In order to do that,we will perform scaling for the series.
```{r, fig.cap= "Figure 6"}
Mort_scale = scale(mort)
plot(Mort_scale, plot.type = "s" ,xlab="Years",ylab="Mortality scaled",col = c("red","darkmagenta", "orange", "blue","darkgreen"),main="Figure-6: Time series plot of Scaled Mortality data-set")
legend("topright",lty=1,cex=0.65, text.width = 2, col=c("red","darkmagenta", "orange", "blue","darkgreen"), c("Mortality", "Temperature", "Chemical-1", "Chemical-2","Particle size"))
```
From the above scaled plot, we can observe that all the time series are likely to be very highly correlated with each other due to their seasonality component.To confirm that correlation, we will calculate and identify the correlation between them.
## Correlation among the series
```{r}
cor(mort)
```
From the above matrix of correlation we can observe that, Mortality has a moderate negative correlation with temperature and positive correlations with chem1, chem2, and particle size. Temperature has weak correlations with the other variables. Chem1 has very strong positive correlations with chem2 and particle size. Chem2 and particle size also share a strong positive correlation.
## Check for Stationarity
We will now proceed with by checking the stationarity of data with the help of ACF, PACF and ADF test.
### ACF and PACF of Mortality series
```{r ,fig.cap="Figure 7"}
par(mfrow=c(1,2))
acf(Mort,main="ACF of Mortality series")
pacf(Mort,main="PACF of Mortality series")
```
From the above ACF and PACF test, it can be interpreted that there is some trend in the series due to decaying pattern of ACF and there is also some autocorrelation present in the series shown by first significant lag in PACF.
### ADF test of Mortality series
$H0$ - series is non-stationary
$H1$ - series is stationary
```{r}
adf.test(Mort)
```
From the above ADF test we can see that the p-value is less than 5% level of significance , hence we reject the null hypothesis $H0$ and confirm our series stationary.
### ACF and PACF of Temperature series
```{r ,fig.cap="Figure 8"}
par(mfrow=c(1,2))
acf(Temp,main="ACF of Temperature series")
pacf(Temp,main="PACF of Temperature series")
```
From the above ACF and PACF test, it can be interpreted that there is some trend in the series due to decaying pattern of ACF and there is also some autocorrelation present in the series shown by first significant lag in PACF.
### ADF test of Temperature series
$H0$ - series is non-stationary
$H1$ - series is stationary
```{r}
adf.test(Temp)
```
From the above ADF test we can see that the p-value is less than 5% level of significance , hence we reject the null hypothesis $H0$ and confirm our series stationary.
### ACF and PACF of Chemical-1 series
```{r, fig.cap="Figure 9"}
par(mfrow=c(1,2))
acf(Chem1,main="ACF of Chemical-1 series")
pacf(Chem1,main="PACF of Chemical-1 series")
```
From the above ACF and PACF test, it can be interpreted that there is some trend in the series due to decaying pattern of ACF and there is also some autocorrelation present in the series shown by first significant lag in PACF.
### ADF test of Chemical-1 series
$H0$ - series is non-stationary
$H1$ - series is stationary
```{r}
adf.test(Chem1)
```
From the above ADF test we can see that the p-value is less than 5% level of significance , hence we reject the null hypothesis $H0$ and confirm our series stationary.
### ACF and PACF of Chemical-2 series
```{r ,fig.cap="Figure 10"}
par(mfrow=c(1,2))
acf(Chem2,main="ACF of Chemical-2 series")
pacf(Chem2,main="PACF of Chemical-2 series")
```
From the above ACF and PACF test, it can be interpreted that there is some trend in the series due to decaying pattern of ACF and there is also some autocorrelation present in the series shown by first significant lag in PACF.
### ADF test of Chemical-2 series
$H0$ - series is non-stationary
$H1$ - series is stationary
```{r}
adf.test(Chem2)
```
From the above ADF test we can see that the p-value is less than 5% level of significance , hence we reject the null hypothesis $H0$ and confirm our series stationary.
### ACF and PACF of Particle Size series
```{r, fig.cap="Figure 11"}
par(mfrow=c(1,2))
acf(Size,main="ACF of Particle Size series")
pacf(Size,main="PACF of Particle Size series")
```
From the above ACF and PACF test, it can be interpreted that there is some trend in the series due to decaying pattern of ACF and there is also some autocorrelation present in the series shown by first significant lag in PACF.
### ADF test of Particle Size series
$H0$ - series is non-stationary
$H1$ - series is stationary
```{r}
adf.test(Size)
```
From the above ADF test we can see that the p-value is less than 5% level of significance , hence we reject the null hypothesis $H0$ and confirm our series stationary.
As we encountered that all the 5 series above are stationary in nature. So we will now proceed with the Decomposition analysis of all the series.
# Decomposition of Series
Analyzing the impact of the components of a time series data on the given data set.The use of **STL** decomposition allows for a more thorough investigation of the time series' **seasonal**, **trend**, and **remainder**.
## STL Decomposition of Mortality series
```{r, fig.cap = "Figure 12 ", warning=FALSE}
#Decomposition of Mort series
Mort_stl = stl(Mort, t.window = 15, s.window = "periodic", robust = TRUE)
plot(Mort_stl, main = "Figure 12: STL decomposition of Mortality series", col="red")
```
The STL decomposition of the mortality series revealed a trend pattern that mirrors the original series, indicating its absence. The ACF and its corresponding plot indicated seasonality in the series, meaning the observed seasonal pattern is significant. Notable interventions appear in the decomposition's remainder between 2012 & 2014 and during some weeks in 2016.
## STL Decomposition of Temperature series
```{r, fig.cap = "Figure 13 ", warning=FALSE}
#Decomposition of Temp series
Temp_stl = stl(Temp, t.window = 15, s.window = "periodic", robust = TRUE)
plot(Temp_stl, main = "Figure 13: STL decomposition of Temperature series", col="darkmagenta")
```
The STL decomposition of the Temperature series indicates that its trend closely resembles the original series, suggesting no distinct trend. The ACF plot revealed clear seasonality in the series, making it essential to consider the seasonal pattern identified. Notable interventions appear in the decomposition's remainder between 2012-2013 and 2016-2017.
## STL Decomposition of Chemical-1 series
```{r, fig.cap = "Figure 14 ", warning=FALSE}
#Decomposition of Temp series
Chem1_stl = stl(Chem1, t.window = 15, s.window = "periodic", robust = TRUE)
plot(Chem1_stl, main = "Figure 14: STL decomposition of Chemical-1 series", col="orange")
```
The STL decomposition of the Chemical-1 series reveals a trend pattern that mirrors the original series, with a noticeable decline. The ACF plot indicates clear seasonality, which is reflected in the decomposition as well. The remainder component of the decomposition displays multiple significant interventions.
## STL Decomposition of Chemical-2 series
```{r, fig.cap = "Figure 15 ", warning=FALSE}
#Decomposition of Temp series
Chem2_stl = stl(Chem2, t.window = 15, s.window = "periodic", robust = TRUE)
plot(Chem2_stl, main = "Figure 15: STL decomposition of Chemical-2 series", col="blue")
```
The STL decomposition of the chem2 series reveals a trend that mirrors the initial series, with a noticeable downward progression. The ACF plot for this series indicates clear seasonality, which is also evident in the decomposition. The residual component of the decomposition displays multiple significant interventions.
## STL Decomposition of Particle size series
```{r, fig.cap = "Figure 16 ", warning=FALSE}
#Decomposition of Temp series
Size_stl = stl(Size, t.window = 15, s.window = "periodic", robust = TRUE)
plot(Size_stl, main = "Figure 16: STL decomposition of Particle size series", col="darkgreen")
```
Based on the STL decomposition of the Particle size series, its trend is consistent with the original series, with no distinct trend evident. The ACF plot for the series indicates clear seasonality, which is mirrored in the decomposition. After 2018, notable interventions are visible in the residual component of the decomposition.
# Time Series Regression Methods
We will try fitting distributed lag models, which incorporate an independent explanatory series and its lags to assist explain the general variance and correlation structure in our dependent series, in an effort to identify a good model for predicting Mortality data-set overall.
To determine the model's finite lag length, we create a procedure that computes measurements of accuracy like AIC/BIC and MASE for models with various lag lengths, then select the model with the lowest values.
## Distributed Lag Model
The Distributed Lag Model from the Regression models describes that the effect of an independent variable on the dependent variables occurs over the time. Therefore we require to build distributed lag models to reduce the multi-collinearity and dependency for each variable. Some of the most important used methods are:
- **Finite Distributed Lag Model**
- **Polynomial Distributed Lag Model**
- **Koyck Distributed Lag Model**
- **Autoregressive Distributed Lag Model**
```{r}
# Pre-defined function for sort.score
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
}
else if (score == "mase") {
x[with(x, order(MASE)),]
}
else {
warning('score = "x" only accepts valid arguments ("aic","bic","mase")')
}
}
```
## Finite Distributed Lag model
In order to find out the best model with the help of Finite distributed lag model, we will consider all the 5 variable series and compute them together in all possible combinations to find the best model out of it.
We will also find the accurate lag-length of the model with the help of **AIC**, **BIC** and **MASE**.
#### Models based on AIC
```{r warning=FALSE}
finiteDLMauto(x=as.vector(Temp+Chem1+Chem2+Size), y= as.vector(Mort), q.min = 1, q.max = 10, model.type="dlm", error.type = "AIC",trace=TRUE)
```
#### Models based on BIC
```{r warning=FALSE}
finiteDLMauto(x=as.vector(Temp+Chem1+Chem2+Size), y= as.vector(Mort), q.min = 1, q.max = 10, model.type="dlm", error.type = "BIC",trace=TRUE)
```
#### Models based on MASE
```{r warning=FALSE}
finiteDLMauto(x=as.vector(Temp+Chem1+Chem2+Size), y= as.vector(Mort), q.min = 1, q.max = 10, model.type="dlm", error.type = "MASE",trace=TRUE)
```
From the above results based on AIC, BIC and MASE we can consider the best lag-length to be 10.We will now fit different possible combinations of the model by keeping **"Mort"** as y and rest as predictor.
# Distributed Model Fitting with all possible combinations
```{r}
dlm_model_temp = dlm(x=as.vector(Temp), y=as.vector(Mort), q=10)
dlm_model_chem1 = dlm(x=as.vector(Chem1), y=as.vector(Mort), q=10)
dlm_model_chem2 = dlm(x=as.vector(Chem2), y=as.vector(Mort), q=10)
dlm_model_size = dlm(x=as.vector(Size), y=as.vector(Mort), q=10)
dlm_model_temp.chem1 = dlm(x=as.vector(Temp+Chem1), y=as.vector(Mort), q=10)
dlm_model_temp.chem2 = dlm(x=as.vector(Temp+Chem2), y=as.vector(Mort), q=10)
dlm_model_temp.size = dlm(x=as.vector(Temp+Size), y=as.vector(Mort), q=10)
dlm_model_chem1.chem2 = dlm(x=as.vector(Chem1+Chem2), y=as.vector(Mort), q=10)
dlm_model_chem1.size = dlm(x=as.vector(Chem1+Size), y=as.vector(Mort), q=10)
dlm_model_chem2.size = dlm(x=as.vector(Chem2+Size), y=as.vector(Mort), q=10)
dlm_model_temp.chem1.size = dlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), q=10)
dlm_model_temp.chem2.size = dlm(x=as.vector(Temp+Chem2+Size), y=as.vector(Mort), q=10)
dlm_model_chem1.chem2.size = dlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), q=10)
dlm_model_temp.chem1.chem2.size = dlm(x=as.vector(Temp+Chem1+Chem2+Size), y=as.vector(Mort), q=10)
```
## Comparing Sort Score based on AIC
```{r}
sort.score(AIC(dlm_model_temp$model,dlm_model_chem1$model,dlm_model_chem2$model,dlm_model_size$model,dlm_model_temp.chem1$model,dlm_model_temp.chem2$model,dlm_model_temp.size$model,dlm_model_chem1.chem2$model,dlm_model_chem1.size$model,dlm_model_chem2.size$model,dlm_model_temp.chem1.size$model,dlm_model_temp.chem2.size$model,dlm_model_chem1.chem2.size$model,dlm_model_temp.chem1.chem2.size$model), score = "aic")
```
From the above out put we can say that the model "dlm_model_chem1" is the best model based on AIC scores.
## Comparing Sort Score based on BIC
```{r}
sort.score(BIC(dlm_model_temp$model,dlm_model_chem1$model,dlm_model_chem2$model,dlm_model_size$model,dlm_model_temp.chem1$model,dlm_model_temp.chem2$model,dlm_model_temp.size$model,dlm_model_chem1.chem2$model,dlm_model_chem1.size$model,dlm_model_chem2.size$model,dlm_model_temp.chem1.size$model,dlm_model_temp.chem2.size$model,dlm_model_chem1.chem2.size$model,dlm_model_temp.chem1.chem2.size$model), score = "bic")
```
From the above output, we again got the same model "dlm_model_chem1" as our best model with respect to the BIC scores.
## Comparing Sort Score based on BIC
```{r}
sort.score(MASE(dlm_model_temp$model,dlm_model_chem1$model,dlm_model_chem2$model,dlm_model_size$model,dlm_model_temp.chem1$model,dlm_model_temp.chem2$model,dlm_model_temp.size$model,dlm_model_chem1.chem2$model,dlm_model_chem1.size$model,dlm_model_chem2.size$model,dlm_model_temp.chem1.size$model,dlm_model_temp.chem2.size$model,dlm_model_chem1.chem2.size$model,dlm_model_temp.chem1.chem2.size$model), score = "mase")
```
From the above outputs, again the model "dlm_model_chem1" is considered to be the best with respect to the MASE scores.
## Analysisng the model "dlm_model_chem1"
We will now analyse and summarize the best model from the finite distributed lag model based on the AIC , BIC and MASE scores.
```{r}
finite_best = dlm(x=as.vector(Chem1), y=as.vector(Mort), q=10)
summary(finite_best)
```
From the outputs of **finite_best** model and its summary :
- The **AIC** and **BIC** scores of the model have been reported $3694.7$ and $3749.43$.
- The adjusted R-squared value reported was $0.53$ which is not bad.
- Few of the coefficients are significant.
- The F-statistic is reported at $51.97$ on 11 and 486 Degrees of Freedom.
## Residual Analysis of the model "dlm_model_chem1"
```{r}
checkresiduals(dlm_model_chem1$model)
```
From the outputs of Model "dlm_model_chem1" residuals :
- The p-value from Breusch-Godfrey test was less than significance level of $0.05$.
- The ACF plot reveals that there is seasonality and correlation exist in residual due to significant lags present in the ACF.
- The time-series plot of residual seems to follow a trend overall.
So, we can conclude that the model "dlm_model_chem1" is a decent model with a appropriate composition of residuals.
## Polynomial Distributed Lag Model
We will now proceed with fitting a polynomial model with the help of "finiteDLMauto" function and select the best lag length before choosing the best model with respect to AIC, BIC and MASE.
### Models based on AIC
```{r warning=FALSE}
finiteDLMauto(x = as.vector(Temp+Chem1+Chem2+Size), y = as.vector(Mort), q.min = 1, q.max = 10, k.order = 2,
model.type = "poly", error.type ="AIC", trace = TRUE)
```
## Models based on BIC
```{r warning=FALSE}
finiteDLMauto(x = as.vector(Temp+Chem1+Chem2+Size), y = as.vector(Mort), q.min = 1, q.max = 10, k.order = 2,
model.type = "poly", error.type ="BIC", trace = TRUE)
```
## Models based on MASE
```{r warning=FALSE}
finiteDLMauto(x = as.vector(Temp+Chem1+Chem2+Size), y = as.vector(Mort), q.min = 1, q.max = 10, k.order = 2,
model.type = "poly", error.type ="MASE", trace = TRUE)
```
Hence, from the above AIC, BIC and MASE we can consider the lag-length 10 as the best in order to find out the scores and ultimately the best polynomial model.
# Polynomial Model Fitting with all possible combinations
```{r include=FALSE}
poly_model_temp = polyDlm(x=as.vector(Temp), y=as.vector(Mort), q=10, k=2)
poly_model_chem1 = polyDlm(x=as.vector(Chem1), y=as.vector(Mort), q=10, k=2)
poly_model_chem2 = polyDlm(x=as.vector(Chem2), y=as.vector(Mort), q=10, k=2)
poly_model_size = polyDlm(x=as.vector(Size), y=as.vector(Mort), q=10, k=2)
poly_model_temp.chem1 = polyDlm(x=as.vector(Temp+Chem1), y=as.vector(Mort), q=10, k=2)
poly_model_temp.chem2 = polyDlm(x=as.vector(Temp+Chem2), y=as.vector(Mort), q=10, k=2)
poly_model_temp.size = polyDlm(x=as.vector(Temp+Size), y=as.vector(Mort), q=10, k=2)
poly_model_chem1.chem2 = polyDlm(x=as.vector(Chem1+Chem2), y=as.vector(Mort), q=10, k=2)
poly_model_chem1.size = polyDlm(x=as.vector(Chem1+Size), y=as.vector(Mort), q=10, k=2)
poly_model_chem2.size = polyDlm(x=as.vector(Chem2+Size), y=as.vector(Mort), q=10, k=2)
poly_model_temp.chem1.size = polyDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), q=10, k=2)
poly_model_temp.chem2.size = polyDlm(x=as.vector(Temp+Chem2+Size), y=as.vector(Mort), q=10, k=2)
poly_model_chem1.chem2.size = polyDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), q=10, k=2)
poly_model_temp.chem1.chem2.size = polyDlm(x=as.vector(Temp+Chem1+Chem2+Size), y=as.vector(Mort), q=10, k=2)
```
## Comparing Sort Scores based on AIC
```{r}
sort.score(AIC(poly_model_temp$model,
poly_model_chem1$model,poly_model_chem2$model,
poly_model_size$model,poly_model_temp.chem1$model,
poly_model_temp.chem2$model,poly_model_temp.size$model,
poly_model_chem1.chem2$model,poly_model_chem1.size$model,
poly_model_chem2.size$model,poly_model_temp.chem1.size$model,
poly_model_temp.chem2.size$model,poly_model_chem1.chem2.size$model,
poly_model_temp.chem1.chem2.size$model), score = "aic")
```
## Comparing Sort Scores based on BIC
```{r}
sort.score(BIC(poly_model_temp$model,
poly_model_chem1$model,poly_model_chem2$model,
poly_model_size$model,poly_model_temp.chem1$model,
poly_model_temp.chem2$model,poly_model_temp.size$model,
poly_model_chem1.chem2$model,poly_model_chem1.size$model,
poly_model_chem2.size$model,poly_model_temp.chem1.size$model,
poly_model_temp.chem2.size$model,poly_model_chem1.chem2.size$model,
poly_model_temp.chem1.chem2.size$model), score = "bic")
```
## Comparing Sort Scores based on MASE
```{r}
sort.score(MASE(poly_model_temp$model,
poly_model_chem1$model,poly_model_chem2$model,
poly_model_size$model,poly_model_temp.chem1$model,
poly_model_temp.chem2$model,poly_model_temp.size$model,
poly_model_chem1.chem2$model,poly_model_chem1.size$model,
poly_model_chem2.size$model,poly_model_temp.chem1.size$model,
poly_model_temp.chem2.size$model,poly_model_chem1.chem2.size$model,
poly_model_temp.chem1.chem2.size$model), score = "mase")
```
From the out puts of above sort scores based on AIC, BIC and MASE we can observe that the model "poly_model_chem1" is considered as the best model overall.
## Analysisng the model "poly_model_chem1"
We will now analyse and summarize the best model from the Polynomial distributed lag model based on the AIC , BIC and MASE scores.
```{r}
polynomial_best = polyDlm(x=as.vector(Chem1), y=as.vector(Mort), q=10, k=2)
summary(polynomial_best)
```
From the outputs of **polynomial_best** model and its summary :
- The residual standard error was 9.94 on 494 defrees of freedom..
- The adjusted R-squared value reported was $0.51$ which is not bad.
- 2 out of 4 of the coefficients are significant.
- The F-statistic is reported at $173.9$ on 3 and 494 Degrees of Freedom.
## Residual Analysis of the model "poly_model_chem1"
```{r}
checkresiduals(polynomial_best$model)
```
From the outputs of Model "poly_model_chem1" residuals :
- The p-value from Breusch-Godfrey test was less than significance level of $0.05$.
- The ACF plot reveals that there is seasonality and correlation exist in residual due to significant lags present in the ACF.
- The time-series plot of residual seems to follow a trend overall.
- The histogram doesn't seems to follow normal distribution and a bit right skewed.
So, we can conclude that the model "poly_model_chem1" is a decent model with an appropriate composition of residuals.
# Koyck Distributed Lag Model
We will now proceed with fitting a Koyck model with the help of "koyckDlm" function and select the best lag length before choosing the best model with respect to AIC, BIC and MASE.
## Koyck Model Fitting with all possible combinations
```{r echo=TRUE}
koyck_model_temp = koyckDlm(x=as.vector(Temp), y=as.vector(Mort))
koyck_model_chem1 = koyckDlm(x=as.vector(Chem1), y=as.vector(Mort))
koyck_model_chem2 = koyckDlm(x=as.vector(Chem2), y=as.vector(Mort))
koyck_model_size = koyckDlm(x=as.vector(Size), y=as.vector(Mort))
koyck_model_temp.chem1 = koyckDlm(x=as.vector(Temp+Chem1), y=as.vector(Mort))
koyck_model_temp.chem2 = koyckDlm(x=as.vector(Temp+Chem2), y=as.vector(Mort))
koyck_model_temp.size = koyckDlm(x=as.vector(Temp+Size), y=as.vector(Mort))
koyck_model_chem1.chem2 = koyckDlm(x=as.vector(Chem1+Chem2), y=as.vector(Mort))
koyck_model_chem1.size = koyckDlm(x=as.vector(Chem1+Size), y=as.vector(Mort))
koyck_model_chem2.size = koyckDlm(x=as.vector(Chem2+Size), y=as.vector(Mort))
koyck_model_temp.chem1.size = koyckDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort))
koyck_model_temp.chem2.size = koyckDlm(x=as.vector(Temp+Chem2+Size), y=as.vector(Mort))
koyck_model_chem1.chem2.size = koyckDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort))
koyck_model_temp.chem1.chem2.size = koyckDlm(x=as.vector(Temp+Chem1+Chem2+Size), y=as.vector(Mort))
```
## Comparing Sort Scores based on MASE
```{r}
koyck_MASE <- MASE(koyck_model_temp,
koyck_model_chem1,koyck_model_chem2,
koyck_model_size,koyck_model_temp.chem1,
koyck_model_temp.chem2,koyck_model_temp.size,
koyck_model_chem1.chem2,koyck_model_chem1.size,
koyck_model_chem2.size,koyck_model_temp.chem1.size,
koyck_model_temp.chem2.size,koyck_model_chem1.chem2.size,
koyck_model_temp.chem1.chem2.size)
# Sorting Mase scores in ascending order
arrange(koyck_MASE,MASE)
```
From the above scores, we can say that the model "koyck_model_chem1" is the best one with respect to the MASE scores.We will now further analyse and summarize the model.
## Analysisng the model "koyck_model_chem1"
We will now analyse and summarize the best model from the Koyck distributed lag model based on the MASE scores.
```{r}
koyck_best = koyckDlm(x=as.vector(Chem1), y=as.vector(Mort))
summary(koyck_best)
```
From the outputs of **koyck_best** model and its summary :
- The residual standard error was 9.01 on 504 degrees of freedom..
- The adjusted R-squared value reported was $0.59$ which is better than previous ones.
- All the coefficients of this model are significant.
- The Wald test is reported at $336.8$ on 2 and 504 Degrees of Freedom.
- The values of Alpha , Beta and phi are $153.011$, $0.70587$ and $0.65057$.
## Residual Analysis of the model "koyck_model_chem1"
```{r}
checkresiduals(koyck_best$model)
```
From the outputs of Model "koyck_model_chem1" residuals :
- The p-value from Breusch-Godfrey test was less than significance level of $0.05$.
- The ACF plot reveals that there correlation exist in residual due to significant lags present in the ACF.
- The time-series plot of residual doesnot seems to follow a trend overall.
- The histogram doesn't seems to follow normal distribution and a bit right skewed.
So, we can conclude that the model "koyck_model_chem1" is a decent model with an appropriate composition of residuals.
# Autoregressive Distributed Lag Model
We will now proceed with fitting a Auto-regressive Distributed lag model with the help of for loop "ardlDlm" function and select the best lag length before choosing the best model with respect to AIC, BIC and MASE.
```{r}
for (i in 1:5){
for(j in 1:5){
autoreg_model = ardlDlm(x= as.vector(Temp+Chem1+Chem2+Size),y=as.vector(Mort), p = i , q = j )
cat("p =", i, "q =", j, "AIC =", AIC(autoreg_model$model), "BIC =", BIC(autoreg_model$model), "MASE =", MASE(autoreg_model)$MASE, "\n")
}
}
```
From the above results of sort score, the lowest value of MASE has been recorded at ARDL(5,5). Hence we will consider p=5 and q=5 while selecting and analyzing the best model.
```{r}
ardl_model_temp = ardlDlm(x=as.vector(Temp), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem1 = ardlDlm(x=as.vector(Chem1), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem2 = ardlDlm(x=as.vector(Chem2), y=as.vector(Mort), p = 5, q = 5)
ardl_model_size = ardlDlm(x=as.vector(Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.chem1 = ardlDlm(x=as.vector(Temp+Chem1), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.chem2 = ardlDlm(x=as.vector(Temp+Chem2), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.size = ardlDlm(x=as.vector(Temp+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem1.chem2 = ardlDlm(x=as.vector(Chem1+Chem2), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem1.size = ardlDlm(x=as.vector(Chem1+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem2.size = ardlDlm(x=as.vector(Chem2+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.chem1.size = ardlDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.chem2.size = ardlDlm(x=as.vector(Temp+Chem2+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_chem1.chem2.size = ardlDlm(x=as.vector(Temp+Chem1+Size), y=as.vector(Mort), p = 5, q = 5)
ardl_model_temp.chem1.chem2.size = ardlDlm(x=as.vector(Temp+Chem1+Chem2+Size), y=as.vector(Mort), p = 5, q = 5)
```
## Comparing Sort Scores based on MASE
```{r}
ardl_MASE <- MASE(ardl_model_temp,
ardl_model_chem1,ardl_model_chem2,
ardl_model_size,ardl_model_temp.chem1,
ardl_model_temp.chem2,ardl_model_temp.size,
ardl_model_chem1.chem2,ardl_model_chem1.size,
ardl_model_chem2.size,ardl_model_temp.chem1.size,
ardl_model_temp.chem2.size,ardl_model_chem1.chem2.size,
ardl_model_temp.chem1.chem2.size)
# Sorting Mase scores in ascending order
arrange(ardl_MASE,MASE)
```
From the above outputs, we can observe that the best Autoregressive distributed lag model being selected based on MASE scores is the "ardl_model_chem1.size" which is combination of predictors Chemical 1 and particle size.Hence we will further analyse the model and do the summary and residual outputs.
## Analysisng the model "ardl_model_chem1.size"
We will now analyse and summarize the best model from the Autoregressive distributed lag model based on the MASE scores.
```{r}
ardl_best = ardlDlm(x=as.vector(Chem1+Size), y=as.vector(Mort))
summary(ardl_best)
```
From the outputs of **ardl_best** model and its summary :
- The residual standard error was 8.955 on 503 degrees of freedom..
- The adjusted R-squared value reported was $0.60$ which is better than previous ones.
- All the coefficients of this model are significant.
- The F-statistic is reported at $255.4$ on 3 and 503 Degrees of Freedom.
- The p-values was less than 5% level.
## Residual Analysis of the model "ardl_model_chem1.size"
```{r}
checkresiduals(ardl_model_chem1.size$model)
```
From the outputs of Model "ardl_model_chem1.size" residuals :
- The p-value from Breusch-Godfrey test was greater than significance level of $0.05$.
- The ACF plot reveals that there is less correlation exist in residual due to just one significant lag present in the ACF.
- The time-series plot of residual doesn't seems to follow a trend overall.
- The histogram seems to follow normal distribution overall.
So, we can conclude that the model "ardl_model_chem1.size" is a better model as compared to previos ones with an appropriate composition of residuals.
# Comparing all Lag models above wrt MASE:
```{r}
mort_dlm_mase <- MASE(finite_best,polynomial_best,koyck_best,ardl_best)
print(mort_dlm_mase)
```
Hence from the above output, we can confirm that "Finite_best" model from finite dlm is the best model based on MASE scores as compared to the rest. Hence model finite with "particle size" as predictor is the best model.
# Dynamic Linear Models
We'll utilize the dynlm() function in the "dynlm" package to fit the model. In order to incorporate a trend component and a seasonal component, the argument formula for the model can contain the functions trend() and season().
```{r}
# Variables for Trend and Seasonality under Dynlm models
Y.t <- Mort
T <- 156 # first intervention point in year 2013(156 weeks)
S.t <- 1*(seq(Y.t) == T)
S.t.1 <- Lag(S.t,+1)
```
```{r}
# Fitting dynamic linear models
dynlm1 <- dynlm(Y.t ~ L(Y.t , k = 1 ) + S.t + trend(Y.t) + season(Y.t))
dynlm2 <- dynlm(Y.t ~ L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
dynlm3 <- dynlm(Y.t ~ L(Y.t , k = 1 ) + S.t + season(Y.t))
dynlm4 <- dynlm(Y.t ~ L(Y.t , k = 1 ) + S.t + trend(Y.t))
dynlm5 <- dynlm(Y.t ~ Chem1 + L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
dynlm6 <- dynlm(Y.t ~ Chem2 + L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
dynlm7 <- dynlm(Y.t ~ Temp + L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
dynlm8 <- dynlm(Y.t ~ Size + L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
```
## Dynamic Linear models compariosn
```{r}
dynlm_mase <- MASE(lm(dynlm1), lm(dynlm2), lm(dynlm3), lm(dynlm4),lm(dynlm5),lm(dynlm6),lm(dynlm7),lm(dynlm8))
arrange(dynlm_mase,MASE)
```
From the above result of MASE scores , we can say that the model "dynlm8" with predictor "particle size" comes out to be the best among all with least MASE and AIC scores. Hence we will further analyse the model and do summary and residual analysis.
## Analysisng the model "dynlm8"
We will now analyse and summarize the best model "dynlm8" from the Dynamic linear model based on the MASE scores.
```{r}
dynlm_best = dynlm8 <- dynlm(Y.t ~ Size + L(Y.t , k = 2 ) + S.t + trend(Y.t) + season(Y.t))
summary(dynlm_best)
```
From the outputs of **dynlm8** model and its summary :
- The residual standard error was 7.974 on 450 degrees of freedom..
- The adjusted R-squared value reported was $0.68$ which is better than previous ones.
- Very few of the coefficients of this model are significant.
- The F-statistic is reported at $20.78$ on 55 and 450 Degrees of Freedom.
- The p-values was less than 5% level.
## Residual Analysis of the model "dynlm8"
```{r}
checkresiduals(dynlm8)
```
From the outputs of Model "dynlm8" residuals :
- The p-value from Breusch-Godfrey test was less than significance level of $0.05$.
- The ACF plot reveals that there is less correlation exist in residual due to significant lags present in the ACF.
- The time-series plot of residual seems to follow a trend overall with seasonal repititions.
- The histogram seems to follow normal distribution overall.
So, we can conclude that the model "dynlm8" is a better model as compared to previous ones with an appropriate composition of residuals.
# Exponential Smoothing Method
We'll now test with exponential smoothing as a forecasting technique. We will only take into account models that feature either additive or multiplicative seasonality because we have discovered a substantial seasonal component in mortality series that we wish to make predictions for.
```{r}
# Converting Mortality data series into monthly format
Mort_monthly = aggregate(zoo(Mort),as.yearmon,sum)
Mort_monthly[Mort_monthly==12.11] <- NA
Mort_monthly = na.omit(Mort_monthly)
# Converting into Time series format
Mort_monthly <- ts(as.vector(t(as.matrix(Mort_monthly[,2:13]))),start=c(2010,1),end = c(2019,9),frequency = 12)
# Print
Mort_monthly
```
## Holt-Winters’ Trend and Seasonality Method
```{r}
# Estimating the model based on several scores
seasonal = c("additive","multiplicative")
damped = c(TRUE,FALSE)
expand = expand.grid(seasonal,damped)