-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathRandomForest.Rmd
109 lines (86 loc) · 2.4 KB
/
RandomForest.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
title: "RandomForest"
author: "Bijan GURUNG"
date: "3/6/2022"
output:
ioslides_presentation: default
powerpoint_presentation: default
---
```{r}
```
## Outline
- A brief overview of RandomForest library in r
- What is RandomForest?
- Where do we use RandomForest?
- How RandomForest is used in regression or classification?
## RandomForest
- To predict the result
- Classify the result of data
- Classify a variable such as True or False (e.g. it rains/true or does not rain/false)
- Categorical variable or non-numeric variable
- The algorithm makes many decision trees (~forests) by random sampling or randomization (~random)
- RandomForest
- Classification or Regression Analysis
## Slide with R Output
Install the required packages for dataset and analysis.
```{r}
#install.packages("stats")
#install.packages("dplyr")
#install.packages("randomForest")
```
- Uploading the required libraries
- We are using the data contained in library "stats"
```{r, message=FALSE}
library(stats)
library(dplyr)
library(randomForest)
```
## Working with "iris" dataset
- data about measurement of plants
```{r}
mydata = iris
head(mydata, 10)
```
- characteristics of the dataset
```{r}
str(mydata)
```
## Analysis using RandomForest
- Divide the entire dataset into
- Training dataset (set a model from the data)
- Testing dataset (to test the model)
```{r}
index = sample(2, nrow(mydata), replace=TRUE, prob=c(0.7, 0.3))
Training = mydata[index == 1, ]
Testing = mydata[index == 2, ]
```
## randomForest function
- Treeline structure (branches and nodes) in the analysis to create a model
```{r}
RFM = randomForest(Species~., data = Training)
```
- Testing of the model on test dataset
```{r}
Species_Pred = predict(RFM, Testing)
```
## Result of the testing of the model into test dataset
- checking of the model in test data (cross validation)
```{r}
Testing$Species_Pred = Species_Pred
head(Testing, 10)
```
## Model accuracy
- High accuracy indicates a good model
- Low accuracy indicates the model failed to capture all characteristics of the data
- Checking of parameters in the analysis (optimization)
```{r}
CFM = table(Testing$Species, Testing$Species_Pred)
```
```{r}
Classification_Accuracy = sum(diag(CFM)/sum(CFM))
Classification_Accuracy
```
##
- Application in other fields (landuse change, finance)
- Sources: https://www.youtube.com/watch?v=acFviblzijU, Easy ML
- http://rpubs.com/BijanGURUNG/874775