-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
116 lines (88 loc) · 3.53 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: "R Code"
---
<br>
# Hello, I'm Katie! I made this website using R Markdown!
<br>
# About this page:
<br>
* I made this R Markdown website to demonstrate and teach R skills
* It looks basic right now, because it is a work in progress
* My first project was a PCA one, see below
* I plan to add more advanced models as time goes by
* Check back regularly to join me on my data science journey
<br>
<br>
Disclaimer: Code and stat tips originate from my experience in academic research. Always remember to use reliable literature (language documentation, published text books, academic research papers, your own class notes) and fact check when using a website as a resource. I will try to keep everything as accurate as possible, but feel free to contact me if you have any questions on facts, figures and references. I hope you find this page useful! Enjoy!
<br>
<br>
# Let's get started!
<br>
```{r setup, include=TRUE}
knitr::opts_chunk$set(echo = TRUE) #Setting this equal to true allows the code to show up on the page
```
```{r libraries, results = 'hide', message = FALSE}
library(tidyverse) #Includes packages such as; ggplot2, dplyr, tibble and more
```
```{r check-wd}
getwd()
```
## Demo of basic R skills 1
```{r demo basic R skills}
# Vector
a_vector <- c(3, 4)
a_vector
```
## Demo of basic R skills 2
```{r demo basic R skills 2}
# Matrix
a_matrix <- matrix(c(1, 2, 3, 4),nrow =2, ncol=2)
a_matrix
```
<br>
## Demo of PCA on iris data
PCA reference: Statistics class notes combined with my own opinions and interpretation
<br>
PCA is useful for dimention reduction and identifying drivers
<br>
```{r view iris}
data(iris)
# View(iris)
```
```{r summary of iris data}
summary(iris)
```
```{r prcomp fit}
fit = prcomp(iris[,1:4])
fit
```
```{r round the summary}
round(fit$rotation, 2)
```
The magnitude of the loadings are important, however the signs on the loadings are arbitrary. It matters if the loadings have opposite signs, but not which is positive and which is negative.
<br>
In PC1, Sepal.length, Petal.Length and Petal.Width all have large positive loadings, whereas Sepal.Width has a negative loading which is close to zero.
<br>
We could interpret the first PC as an overall measure of the ‘size’ of the flower. The sepal width variable has close to zero loading.
<br>
PC2 is contrasting flowers that have large sepals against flowers that have long petals.
<br>
```{r summary of fit}
summary(fit)
```
With regard to the code chunk above, it is important to note that the value of a particular eigenvalue divided by the sum of the eigenvalues is called the proportion of variance
explained by that principal component (REF: A past statistics class)
```{r plot prop of var explained}
# We can plot the proportion of variance explained by each PC using:
plot(fit, main = "Scree plot", xlab = "Comps 1:4")
```
```{r predict from fit, results = "hide"}
newiris = predict(fit)
newiris
```
To get a ‘picture’ of the reduced dimensional data we often produce a scatter plot of the scores on one PC against the scores on another PC. As we have decided that only PC1 (and maybe PC2) need to be considered we can use the plot function to plot the values of PC1 against PC2 (REF: A past statistics class)
```{r PC1 scores v PC2 scores}
plot(newiris[,1], newiris[,2], type="n", xlab="PC1", ylab="PC2")
text(newiris[,1], newiris[,2], labels=substr(iris[,5],1,2), col=as.integer(iris[,5]))
```
You will notice that we have performed PCA on the unstandardised data. Sometimes the data are standardised before performing PCA (REF: A past statistics class)