-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro.Rmd
355 lines (277 loc) · 10.1 KB
/
intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
title: "Using R and RStudio"
---
R is an object-oriented programming language for statistical computing and
graphics.
There are many benefits to learning and using R:
1. Free and open source
2. Thousands of packages for your analysis needs
3. Active community (lots on online help)
4. Has excellent publication-quality graphics
5. Standard for statisticians and data analysts for statistical analysis
6. Can interact with other programming languages
# Learning objectives:
- Using RStudio and interacting with R
- Basic key bindings/shortcuts
- Using R Projects
- Knowing how to get help (vignettes, `?`)
- Difference between R Markdown vs R scripts
- What is R Markdown and how to use it
- Using basic and common R codes
- Installing and loading packages
- Importing and exporting datasets
- Viewing your data
- Data types
- Assigning variables
- Extracting values from the data
- Using and making functions
# Material and code
## R and RStudio
Interacting with R (sending commands and getting results output) is done through
the console.
RStudio ([cheatsheet here](https://www.rstudio.com/wp-content/uploads/2016/01/rstudio-IDE-cheatsheet.pdf))
has four panes, two of which we'll cover first: Console pane and open files pane.
The console is where code is sent and R gives back an output from that code. The
open files include R scripts (`.R` files) and R markdown (`.Rmd`) files. You
write your R code in the R/Rmd files and send the line of code to the console
using `Ctrl-Enter`.
It's generally good practice to make an [R project](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
for a specific research objective (e.g. your thesis). Create an `.Rproj` by
using `File -> New Project`. Anytime you do work on that project, open the
`.Rproj` file or via `File -> Recent Projects`. You can have multiple R projects.
## Getting help
Getting help is easy in R. There is help for individual functions (i.e.
commands) as well as for R packages. Help for specific commands uses `?` in
front of the function:
```{r, eval=FALSE}
?head
```
Sometimes, specific packages have a more detailed description of the package,
called a vignette. You can get information on the vignettes inside a package,
followed by the specific vignette you want to read:
```{r, eval=FALSE}
vignette(package = 'dplyr')
vignette('introduction', package = 'dplyr')
```
Not all packages have vignettes, but it's always useful to check if there is
one.
## Types of R files
There are two types of R files: `.R` and `.Rmd`. R scripts (`.R`) contain only R
code and nothing else; these are useful when all you want to do is run a set
code sequence and produce a specific result, figure, or output.
[R Markdown](http://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf)
(`.Rmd`) files can contain R code, text, citations, tables, figures, lists, and
so on. An R Markdown file is like Word... except much more powerful as it can
include R code *within* the document and can convert to a variety of file types
including HTML and Word docx. Using this file type makes your research,
manuscripts, and thesis more reproducible and saves you time and stress. For
most of the workshop we'll be using R Markdown.
You can recognize an `.Rmd` file as it generally has this at the very top of the
file:
```
---
title: "Introducing R Markdown"
author: "Luke Johnston"
date: "July 23, 2015"
output: html_document
---
```
This snippet of code is called YAML. We'll talk about that more in a
[later workshop](report.html). For now, just know that `rmarkdown` (the package)
uses the YAML to know what to convert the document to (in this case html).
## Installing and loading packages
Installing packages can be done using either RStudio in the Packages pane or
using the command:
```{r, eval=FALSE}
install.packages('packagename')
```
We'll use the package `readr` soon, so let's load it up:
```{r load}
library(readr)
```
We can now use the functions inside the `readr` package. To see what functions
there are, you can go to the console and start typing `readr::`, then hit TAB.
A list of functions will pop up, showing all the functions inside `readr`. This
TAB-completion is very useful.
## Importing and exporting
Let's use the `readr` functions to load in a dataset from the codeasmanuscript
site. It's generally good to store your data as `csv` files, which are known as
comma separated values. These are plain text, meaning there is nothing but text
in the file (unlike Excel `.xlsx` files which are *not* plain text; they contain
markup inside the file itself).
You can import files from your computer. It keep things easy, *make sure* the
data file is in the same folder as your `.Rmd`/`.R`/`.Rproj` file(s). You can
import the file into R via...
```{r create_data, echo=FALSE}
library(magrittr)
state.x77 %>%
as.data.frame() %>%
tibble::as_tibble() %>%
tibble::rownames_to_column(var = 'StateName') %>%
dplyr::mutate(
Region = state.region,
Division = state.division,
Longitude = state.center$x,
Latitude = state.center$y
) %>%
dplyr::rename(LifeExp = `Life Exp`, HSGrad = `HS Grad`) %>%
readr::write_csv('states_data.csv')
```
```{r import}
# comment: to import use:
ds <- read_csv('states_data.csv')
# or from a website
# ds <- read_csv('http://codeasmanuscript.org/states_data.csv')
ds
```
To export data inside R to a file, use...
```{r export, eval=FALSE}
write_csv('states.csv')
```
## Viewing your data
There are several functions to look over your data.
```{r viewing}
# comment: quick summary
summary(ds)
# first 6 rows
head(ds)
# last 6 rows
tail(ds)
# column names of your dataset
colnames(ds)
```
A very detailed view of the exact contents of the data object, the `str` function
is very useful:
```{r structure}
str(ds)
```
To see the dimensions (number of rows and columns) on a dataset you can use
```{r dim}
dim(ds)
```
## Data types
There are several types of data in R. In general, these can be simplified into
two classes: continuous and discrete. These can then be classified into other
groups.
1. Continuous:
- numeric
- integer
2. Discrete:
- character
- logical
- factor
Continuous data types look like...
```{r continuous}
0.5:10.5
class(0.5:10.5)
1L:10L # comment: the `L` forces R to see it as an integer
class(1L:10L)
```
... while discrete data types look like
```{r discrete}
c(TRUE, FALSE) # comment: the `c` means 'combine' or 'concatenate'
class(c(TRUE, FALSE))
c('hi', 'there') # this is also called a vector
as.numeric(c('hi', 'there'))
class(c('hi', 'there', 'hi'))
factor(c('hi', 'there', 'hi'))
class(factor(c('hi', 'there', 'hi')))
# comment: factors are basically numbers underneath.
as.numeric(factor(c('hi', 'there', 'hi')))
# comment: characters are not.
as.numeric(c('hi', 'there'))
```
There are also more complex object types in R. Two common ones are lists and
dataframes. A dataframe is what you just loaded above in the `ds` object. It
can contain any data type except for other dataframes. So...
```{r dataframe}
class(ds)
```
A list can have any number of data types inside it, including dataframes and
more lists.
```{r lists}
ds_list <- list(data = ds, number = 1:10, char = c('hi', 'there'))
class(ds_list)
ds_list
names(ds_list) # find out the names of the objects inside the list
```
## Variable assignment
Often you need to run a function to get an output and then use that output to
do other things to it, like plot it or make it into a table. In order to help
with that, there is variable assignment using the assignment operator `<-`. We
used it earlier to assign the dataframe from `read_csv` to the `ds` object.
```{r assignment_op}
weight_kg <- 75
weight_kg
weight_lb <- weight_kg * 2.2
weight_lb
```
## Extracting values from an object
You can use several methods to take one or more items from a vector, dataframe,
or list using `$`, `[]`, or `[[]]`.
```{r dataframe_extraction}
# vector can only use []
num <- 1:10
num[1] # first item
num[9] # ninth item
# dataframes and lists can use any of the methods
ds$Income # directly choose the Income column, converts to vector
ds['Income'] # same as above, but keeps as column
ds[c('Income', 'Population')] # same as above, but keeps as column
ds[3] # using numbers, again keeping as column
ds[[3]] # converts to number
# combining [[]] and []
ds[[3]][4] # choose fourth item of third column
# ds[3][4] # this doesn't work
ds[3,4] # but this does, which means [rownumber, columnnumber]
ds[[3,4]] # converts to number
# a range:
ds[1:5, 1:4]
```
```{r list_extraction}
# as a list
ds_list$number # converts to vector
ds_list[[2]] # as a vector
ds_list[2] # as a named vector
# combining [[]] and []
ds_list[[2]][4] # chooses fourth item of the second list item
```
## Using and making functions
In R, nearly every command is a function. You can look at the contents of any
function by simply typing in the function without the `()`:
```{r inside_funs}
factor
```
For a beginner, this doesn't always come in handy. However, it is something that
is very useful to know as you get more familiar with R.
All functions have arguments. You can see the arguments by hitting TAB when your
cursor is inside the function (between the `()`). A list of argument options will
be shown as well as a quick help on what the argument is and needs. Passing a
value to an argument (e.g. `function_name(argument1 = value)`), lets the
function perform its action with that argument value. Let's make a simple
example by doing a sum of two values, plus an extra third that has the default
value of 0:
```{r create_fun}
adding <- function(value1, value2, value3 = 0) {
value1 + value2
}
```
We have now created a new function called `adding`. The first two arguments *require*
a value, while the third has a default so doesn't need a value. Let's
add 2 with 2. There are two ways to do it, using positional arguments and named
arguments.
```{r arg_type}
# positional
adding(2, 2)
# named
adding(value1 = 2, value2 = 2)
```
Generally, the first two arguments are made to be positional, but further
arguments are named. Like so:
```{r}
adding(2, 2, value3 = 2)
```
That's the simple run down of making functions. We'll make more complex ones as
we learn more.
## Assignment:
TBA