-
Notifications
You must be signed in to change notification settings - Fork 48
/
Copy path02-basics.Rmd
executable file
·668 lines (438 loc) · 36.5 KB
/
02-basics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
# Some R basics {#basics_r}
In this Chapter we'll introduce you to using R and RStudio to perform some basic R tasks such as creating objects and assigning values to objects, exploring different types of objects and how to perform some common operations on objects. We'll also learn how to get help in R and highlight some resources to help support your R learning. Finally, we'll cover how to save your work.
Before we continue, here are a few things to bear in mind as you work through this Chapter:
- R is case sensitive i.e. `A` is not the same as `a` and `anova` is not the same as `Anova`.
- Anything that follows a `#` symbol is interpreted as a comment and ignored by R. Comments should be used liberally throughout your code for both your own information and also to help your collaborators. Writing comments is a bit of an [art][comment] and something that you will become more adept at as your experience grows.
- In R, commands are generally separated by a new line. You can also use a semicolon `;` to separate your commands but this is rarely used.
- If a continuation prompt `+` appears in the console after you execute your code this means that you haven't completed your code correctly. This often happens if you forget to close a bracket and is especially common when nested brackets are used (`(((some command))`). Just finish the command on the new line and fix the typo or hit escape on your keyboard (see point below) and fix.
- In general, R is fairly tolerant of extra spaces inserted into your code, in fact using spaces is actively encouraged. However, spaces should not be inserted into operators i.e. `<-` should not read `< -` (note the space). See Google's [style guide][style-google] for advice on where to place spaces to make your code more readable.
- If your console 'hangs' and becomes unresponsive after running a command you can often get yourself out of trouble by pressing the escape key (esc) on your keyboard or clicking on the stop icon in the top right of your console. This will terminate most current operations.
## Getting started
In [Chapter 1](#chap1) we learned about the [R Console](#cons) and creating scripts and [Projects](#rsprojs) in RStudio. We also saw how you write your R code in a script and then source this code into the console to get it to run (if you've forgotten how to do this, pop back to the [console](#cons) section to refresh your memory). Writing your code in a script means that you'll always have a permanent record of everything you've done (provided you save your script) and also allows you to make loads of comments to remind your future self what you've done. So, while you're working through this Chapter we suggest that you create a new script (or RStudio [Project](#rsprojs)) to write your code as you follow along.
As we saw in the previous [Chapter](#chap1), at a basic level we can use R much as you would use a calculator. We can type an arithmetic expression into our script, then source it into the console and receive a result. For example, if we type the expression `2 + 2` and then source this line of code we get the answer `4` (reassuringly!)
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
2 + 2
```
The `[1]` in front of the result tells you that the observation number at the beginning of the line is the first observation. This is not much help in this example, but can be quite useful when printing results with multiple lines (we'll see an example below). The other obvious arithmetic operators are `-`, `*`, `/` for subtraction, multiplication and division respectively. R follows the usual mathematical convention of [order of operations][op-prec]. For example, the expression `2 + 3 * 4` is interpreted to have the value `2 + (3 * 4) = 14`, not `(2 + 3) * 4 = 20`. There are a huge range of mathematical functions in R, some of the most useful include; `log()`\index{log()}, `log10()`\index{log10()}, `exp()`\index{exp()}, `sqrt()`\index{sqrt()}.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
log(1) # logarithm to base e
log10(1) # logarithm to base 10
exp(1) # natural antilog
sqrt(4) # square root
4^2 # 4 to the power of 2
pi # not a function but useful
```
It's important to realise that when you run code as we've done above, the result of the code (or **value**) is only displayed in the console. Whilst this can sometimes be useful it is usually much more practical to store the value(s) in a object.
## Objects in R
At the heart of almost everything you will do (or ever likely to do) in R is the concept that everything in R is an [object][chambers]. These objects can be almost anything, from a single number or character string (like a word) to highly complex structures like the output of a plot, a summary of your statistical analysis or a set of R commands that perform a specific task. Understanding how you create objects and assign values to objects is key to understanding R.
### Creating objects {#r_objs}
```{block2, vid-text5, type='rmdvideo'}
See this [video][objs-vid] for an introduction to creating and managing objects in R
```
\
To create an object we simply give the object a name. We can then assign a value to this object using the *assignment operator* `<-` (sometimes called the *gets operator*). The assignment operator is a composite symbol comprised of a ‘less than’ symbol `<` and a hyphen `-` .
```{r, echo=TRUE, eval=TRUE}
my_obj <- 48
```
In the code above, we created an object called `my_obj` and assigned it a value of the number `48` using the assignment operator (in our head we always read this as '*my_obj gets 48*'). You can also use `=` instead of `<-` to assign values but this is considered bad practice and we would discourage you from using this notation.
To view the value of the object you simply type the name of the object.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_obj
```
Now that we've created this object, R knows all about it and will keep track of it during this current R session. All of the objects you create will be stored in the current workspace and you can view all the objects in your workspace in RStudio by clicking on the 'Environment' tab in the top right hand pane.
\
```{r rstudio_env, echo=FALSE, out.width="75%", fig.align="center"}
knitr::include_graphics(path = "images/rs_env.png")
```
\
If you click on the down arrow on the 'List' icon in the same pane and change to 'Grid' view RStudio will show you a summary of the objects including the type (numeric - it's a number), the length (only one value in this object), its 'physical' size and its value (48 in this case).
\
```{r rstudio_env2, echo=FALSE, out.width="75%", fig.align="center"}
knitr::include_graphics(path = "images/rs_env2.png")
```
\
There are many different types of values that you can assign to an object. For example
```{r, echo=TRUE, eval=TRUE}
my_obj2 <- "R is cool"
```
Here we have created an object called `my_obj2` and assigned it a value of `R is cool` which is a character string. Notice that we have enclosed the string in quotes. If you forget to use the quotes you will receive an error message.
```{r, echo=TRUE, eval=FALSE}
my_obj2 <- R is cool
Error: unexpected symbol in "my_obj2 <- R is"
```
Our workspace now contains both objects we've created so far with `my_obj2` listed as type character.
\
```{r rstudio_env3, echo=FALSE, out.width="75%", fig.align="center"}
knitr::include_graphics(path = "images/rs_env3.png")
```
\
To change the value of an existing object we simply reassign a new value to it. For example, to change the value of `my_obj2` from `"R is cool"` to the number `1024`.
```{r, echo=TRUE, eval=TRUE}
my_obj2 <- 1024
```
Notice that the Type has changed to numeric and the value has changed to 1024 in the environment.
\
```{r rstudio_env4, echo=FALSE, out.width="75%", fig.align="center"}
knitr::include_graphics(path = "images/rs_env4.png")
```
\
Once we have created a few objects, we can do stuff with our objects. For example, the following code creates a new object `my_obj3` and assigns it the value of `my_obj` added to `my_obj2` which is 1072 (48 + 1024 = 1072).
```{r, echo=TRUE, eval=TRUE}
my_obj3 <- my_obj + my_obj2
my_obj3
```
Notice that to display the value of `my_obj3` we also need to write the object's name. The above code works because the values of both `my_obj` and `my_obj2` are numeric (i.e. a number). If you try to do this with objects with character values (**character class**) you will receive an error.
```{r, echo=TRUE, eval=FALSE}
char_obj <- "hello"
char_obj2 <- "world!"
char_obj3 <- char_obj + char_obj2
Error in char_obj+char_obj2:non-numeric argument to binary operator
```
The error message is essentially telling you that either one or both of the objects `char_obj` and `char_obj2` is not a number and therefore cannot be added together.
When you first start learning R, dealing with errors and warnings can be frustrating as they're often difficult to understand (what's an [*argument*][r_arg]? what's a [*binary operator*][bin_op]?). One way to find out more information about a particular error is to Google a generalised version of the error message. For the above error try Googling [*'non-numeric argument to binary operator error + r'*][non_num_err] or even [*'common r error messages'*][com_err].
Another error message that you'll get quite a lot when you first start using R is `Error: object 'XXX' not found`. As an example, take a look at the code below.
```{r, echo=TRUE, eval=FALSE}
my_obj <- 48
my_obj4 <- my_obj + no_obj
Error: object 'no_obj' not found
```
R returns an error message because we haven't created (defined) the object `no_obj` yet. Another clue that there's a problem with this code is that, if you check your environment, you'll see that object `my_obj4` has not been created.
### Naming objects
Naming your objects is one of the most difficult things you will do in R (honestly - we're serious). Ideally your object names should be kept both short and informative which is not always easy. If you need to create objects with multiple words in their name then use either an underscore or a dot between words or capitalise the different words. We prefer the underscore format (called [*snake case*][snake]).
```{r, echo=TRUE, eval=FALSE}
output_summary <- "my analysis"
output.summary <- "my analysis"
outputSummary <- "my analysis"
```
There are also a few limitations when it come to giving objects names. An object name cannot start with a number or a dot followed by a number (i.e. `2my_variable` or `.2my_variable`). You should also avoid using non-alphanumeric characters in your object names (i.e. &, ^, /, ! etc). In addition, make sure you don’t name your objects with reserved words (i.e. `TRUE`, `NA`) and it's never a good idea to give your object the same name as a built-in function. One that crops up more times than we can remember is
```{r, echo=TRUE, eval=FALSE}
data <- read.table("mydatafile", header = TRUE) #data is a
# function!
```
## Using functions in R
Up until now we've been creating simple objects by directly assigning a single value to an object. It's very likely that you'll soon want to progress to creating more complicated objects as your R experience grows and the complexity of your tasks increase. Happily, R has a multitude of functions to help you do this. You can think of a function as an object which contains a series of instructions to perform a specific task. The base installation of R comes with many functions already defined or you can increase the power of R by installing one of the 10000's of [packages](#packages) now available. Once you get a bit more experience with using R you may want to define your own functions to perform tasks that are specific to your goals (more about this in [Chapter 7](#prog_r)).
\
```{block2, vid-text6, type='rmdvideo'}
See this [video][func-vid] for a general introduction to using functions in R and this [video][vec-vid] on how to create vectors in R
```
\
The first function we will learn about is the `c()`\index{c()} function. The `c()` function is short for concatenate and we use it to join together a series of values and store them in a data structure called a [**vector**][vector] (more on vectors in [Chapter 3](#data_r)).
```{r, echo=TRUE, eval=TRUE}
my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)
```
In the code above we've created an object called `my_vec` and assigned it a value using the function `c()`. There are a couple of really important points to note here. Firstly, when you use a function in R, the function name is **always** followed by a pair of round brackets even if there's nothing contained between the brackets. Secondly, the argument(s) of a function are placed inside the round brackets and are separated by commas. You can think of an argument as way of customising the use or behaviour of a function. In the example above, the arguments are the numbers we want to concatenate. Finally, one of the tricky things when you first start using R is to know which function to use for a particular task and how to use it. Thankfully each function will always have a help document associated with it which will explain how to use the function (more on this [later](#help)) and a quick Google search will also usually help you out.
To examine the value of our new object we can simply type out the name of the object as we did before.
```{r, echo=TRUE, eval=TRUE}
my_vec
```
Now that we've created a vector we can use other functions to do useful stuff with this object. For example, we can calculate the mean, variance, standard deviation and number of elements in our vector by using the `mean()`\index{mean()}, `var()`\index{var()}, `sd()`\index{sd()} and `length()`\index{length()} functions.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
mean(my_vec) # returns the mean of my_vec
var(my_vec) # returns the variance of my_vec
sd(my_vec) # returns the standard deviation of my_vec
length(my_vec) # returns the number of elements in my_vec
```
If we wanted to use any of these values later on in our analysis we can just assign the resulting value to another object.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
vec_mean <- mean(my_vec) # returns the mean of my_vec
vec_mean
```
Sometimes it can be useful to create a vector that contains a regular sequence of values in steps of one. Here we can make use of a shortcut using the `:` symbol.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq <- 1:10 # create regular sequence
my_seq
my_seq2 <- 10:1 # in decending order
my_seq2
```
Other useful functions for generating vectors of sequences include the `seq()`\index{seq()} and `rep()`\index{rep()} functions. For example, to generate a sequence from 1 to 5 in steps of 0.5.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq2 <- seq(from = 1, to = 5, by = 0.5)
my_seq2
```
Here we've used the arguments `from =` and `to =` to define the limits of the sequence and the `by =` argument to specify the increment of the sequence. Play around with other values for these arguments to see their effect.
The `rep()` function allows you to replicate (repeat) values a specified number of times. To repeat the value 2, 10 times
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq3 <- rep(2, times = 10) # repeats 2, 10 times
my_seq3
```
You can also repeat non-numeric values
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq4 <- rep("abc", times = 3) # repeats ‘abc’ 3 times
my_seq4
```
or each element of a series
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq5 <- rep(1:5, times = 3) # repeats the series 1 to
# 5, 3 times
my_seq5
```
or elements of a series.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq6 <- rep(1:5, each = 3) # repeats each element of the
#series 3 times
my_seq6
```
We can also repeat a non-sequential series.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_seq7 <- rep(c(3, 1, 10, 7), each = 3) # repeats each
# element of the
# series 3 times
my_seq7
```
Note in the code above how we've used the `c()` function inside the `rep()` function. Nesting functions allows us to build quite complex commands within a single line of code and is a very common practice when using R. However, care needs to be taken as too many nested functions can make your code quite difficult for others to understand (or yourself some time in the future!). We could rewrite the code above to explicitly separate the two different steps to generate our vector. Either approach will give the same result, you just need to use your own judgement as to which is more readable.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
in_vec <- c(3, 1, 10, 7)
my_seq7 <- rep(in_vec, each = 3) # repeats each element of
# the series 3 times
my_seq7
```
## Working with vectors {#vectors}
Manipulating, summarising and sorting data using R is an important skill to master but one which many people find a little confusing at first. We'll go through a few simple examples here using vectors to illustrate some important concepts but will build on this in much more detail in [Chapter 3](#data_r) where we will look at more complicated (and useful) data structures.
\
```{block2, vid-text7, type='rmdvideo'}
Take a look at this [video][vec2-vid] for a quick introduction to working with vectors in R using positional and logical indexes
```
### Extracting elements
To extract (also known as indexing or subscripting) one or more values (more generally known as elements) from a vector we use the square bracket `[ ]` notation. The general approach is to name the object you wish to extract from, then a set of square brackets with an index of the element you wish to extract contained within the square brackets. This index can be a position or the result of a logical test.
#### Positional index {-}
To extract elements based on their position we simply write the position inside the `[ ]`. For example, to extract the 3rd value of `my_vec`
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec # remind ourselves what my_vec looks like
my_vec[3] # extract the 3rd value
# if you want to store this value in another object
val_3 <- my_vec[3]
val_3
```
Note that the positional index starts at 1 rather than 0 like some other programming languages (i.e. Python).
We can also extract more than one value by using the `c()` function inside the square brackets. Here we extract the 1^st^, 5^th^, 6^th^ and 8^th^ element from the `my_vec` object
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[c(1, 5, 6, 8)]
```
Or we can extract a range of values using the `:` notation. To extract the values from the 3^rd^ to the 8^th^ elements.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[3:8]
```
#### Logical index {-}
Another really useful way to extract data from a vector is to use a logical expression as an index. For example, to extract all elements with a value greater than 4 in the vector `my_vec`
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[my_vec > 4]
```
Here, the logical expression is `my_vec > 4` and R will only extract those elements that satisfy this logical condition. So how does this actually work? If we look at the output of just the logical expression without the square brackets you can see that R returns a vector containing either `TRUE` or `FALSE` which correspond to whether the logical condition is satisfied for each element. In this case only the 4^th^ and 8^th^ elements return a `TRUE` as their value is greater than 4.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec > 4
```
So what R is actually doing under the hood is equivalent to
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)]
```
and only those element that are `TRUE` will be extracted.
In addition to the `<` and `>` operators you can also use composite operators to increase the complexity of your expressions. For example the expression for 'greater or equal to' is `>=`. To test whether a value is equal to a value we need to use a double equals symbol `==` and for 'not equal to' we use `!=` (the `!` symbol means 'not').
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[my_vec >= 4] # values greater or equal to 4
my_vec[my_vec < 4] # values less than 4
my_vec[my_vec <= 4] # values less than or equal to 4
my_vec[my_vec == 4] # values equal to 4
my_vec[my_vec != 4] # values not equal to 4
```
We can also combine multiple logical expressions using [Boolean expressions][boolean]. In R the `&` symbol means AND and the `|` symbol means OR. For example, to extract values in `my_vec` which are less than 6 AND greater than 2
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
val26 <- my_vec[my_vec < 6 & my_vec > 2]
val26
```
or extract values in `my_vec` that are greater than 6 OR less than 3.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
val63 <- my_vec[my_vec > 6 | my_vec < 3]
val63
```
### Replacing elements
We can change the values of some elements in a vector using our `[ ]` notation in combination with the assignment operator `<-`. For example, to replace the 4^th^ value of our `my_vec` object from `6` to `500`
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
my_vec[4] <- 500
my_vec
```
We can also replace more than one value or even replace values based on a logical expression.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
# replace the 6th and 7th element with 100
my_vec[c(6, 7)] <- 100
my_vec
# replace element that are less than or equal to 4 with 1000
my_vec[my_vec <= 4] <- 1000
my_vec
```
### Ordering elements {#vec_ord}
In addition to extracting particular elements from a vector we can also order the values contained in a vector. To sort the values from lowest to highest value we can use the `sort()`\index{sort} function.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
vec_sort <- sort(my_vec)
vec_sort
```
To reverse the sort, from highest to lowest, we can either include the `decreasing = TRUE` argument when using the `sort()` function
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
vec_sort2 <- sort(my_vec, decreasing = TRUE)
vec_sort2
```
or first sort the vector using the `sort()` function and then reverse the sorted vector using the `rev()`\index{rev()} function. This is another example of nesting one function inside another function.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
vec_sort3 <- rev(sort(my_vec))
vec_sort3
```
Whilst sorting a single vector is fun, perhaps a more useful task would be to sort one vector according to the values of another vector. To do this we should use the `order()`\index{order()} function in combination with `[ ]`. To demonstrate this let's create a vector called `height` containing the height of 5 different people and another vector called `p.names` containing the names of these people (so Joanna is 180 cm, Charlotte is 155 cm etc).
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
height <- c(180, 155, 160, 167, 181)
height
p.names <- c("Joanna", "Charlotte", "Helen", "Karen", "Amy")
p.names
```
Our goal is to order the people in `p.names` in ascending order of their `height`. The first thing we'll do is use the `order()` function with the `height` variable to create a vector called `height_ord`.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
height_ord <- order(height)
height_ord
```
OK, what's going on here? The first value, `2`, (remember ignore `[1]`) should be read as ‘the smallest value of `height` is the second element of the `height` vector’. If we check this by looking at the `height` vector above, you can see that element 2 has a value of 155, which is the smallest value. The second smallest value in `height` is the 3^rd^ element of `height`, which when we check is 160 and so on. The largest value of `height` is element `5` which is 181. Now that we have a vector of the positional indices of heights in ascending order (`height_ord`), we can extract these values from our `p.names` vector in this order.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
names_ord <- p.names[height_ord]
names_ord
```
You're probably thinking ‘what’s the use of this?’ Well, imagine you have a dataset which contains two columns of data and you want to sort each column. If you just use `sort()` to sort each column separately, the values of each column will become uncoupled from each other. By using the `order()` on one column, a vector of positional indices is created of the values of the column in ascending order. This vector can be used on the second column, as the index of elements which will return a vector of values based on the first column.
### Vectorisation
One of the great things about R functions is that most of them are vectorised. This means that the function will operate on all elements of a vector without needing to apply the function on each element separately. For example, to multiple each element of a vector by 5 we can simply use
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
# create a vector
my_vec2 <- c(3, 5, 7, 1, 9, 20)
# multiply each element by 5
my_vec2 * 5
```
Or we can add the elements of two or more vectors
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
# create a second vector
my_vec3 <- c(17, 15, 13, 19, 11, 0)
# add both vectors
my_vec2 + my_vec3
# multiply both vectors
my_vec2 * my_vec3
```
However, you must be careful when using vectorisation with vectors of different lengths as R will quietly recycle the elements in the shorter vector rather than throw a wobbly (error).
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
# create a third vector
my_vec4 <- c(1, 2)
# add both vectors - quiet recycling!
my_vec2 + my_vec4
```
### Missing data {#na_vals}
In R, missing data is usually represented by an `NA` symbol meaning 'Not Available'. Data may be missing for a whole bunch of reasons, maybe your machine broke down, maybe you broke down, maybe the weather was too bad to collect data on a particular day etc etc. Missing data can be a pain in the proverbial both from an R perspective and also a statistical perspective. From an R perspective missing data can be problematic as different functions deal with missing data in different ways. For example, let's say we collected air temperature readings over 10 days, but our thermometer broke on day 2 and again on day 9 so we have no data for those days.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
temp <- c(7.2, NA, 7.1, 6.9, 6.5, 5.8, 5.8, 5.5, NA, 5.5)
temp
```
We now want to calculate the mean temperature over these days using the `mean()` function.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
mean_temp <- mean(temp)
mean_temp
```
Flippin heck, what's happened here? Why does the `mean()` function return an `NA`? Actually, R is doing something very sensible (at least in our opinion!). If a vector has a missing value then the only possible value to return when calculating a mean is `NA`. R doesn't know that you perhaps want to ignore the `NA` values (R can't read your mind - yet!). Happily, if we look at the help file (use `help("mean")` - see the [next section](#help) for more details) associated with the `mean()` function we can see there is an argument `na.rm = ` which is set to `FALSE` by default.
> na.rm - a logical value indicating whether NA values should be stripped before the computation proceeds.
If we change this argument to `na.rm = TRUE` when we use the `mean()` function this will allow us to ignore the `NA` values when calculating the mean.
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
mean_temp <- mean(temp, na.rm = TRUE)
mean_temp
```
It's important to note that the `NA` values have not been removed from our `temp` object (that would be bad practice), rather the `mean()` function has just ignored them. The point of the above is to highlight how we can change the default behaviour of a function using an appropriate argument. The problem is that not all functions will have an `na.rm =` argument, they might deal with `NA` values differently. However, the good news is that every help file associated with any function will **always** tell you how missing data are handled by default.
## Getting help {#help}
This book is intended as a relatively brief introduction to R and as such you will soon be using functions and packages that go beyond this scope of this introductory text. Fortunately, one of the strengths of R is its comprehensive and easily accessible help system and wealth of online resources where you can obtain further information.
### R help
To access R’s built-in help facility to get information on any function simply use the `help()`\index{help()} function. For example, to open the help page for our friend the `mean()` function.
```{r, echo=TRUE, eval=FALSE}
help("mean")
```
or you can use the equivalent shortcut.
```{r, echo=TRUE, eval=FALSE}
?mean
```
After you run the code, the help page is displayed in the 'Help' tab in the Files pane (usually in the bottom right of RStudio).
\
```{r rstudio_help, echo=FALSE, out.width="50%", fig.align="center"}
knitr::include_graphics(path = "images/rs_help.png")
```
\
Admittedly the help files can seem anything but helpful when you first start using R. This is probably because they're written in a very concise manner and the language used is often quite technical and full of jargon. Having said that, you do get used to this and will over time even come to appreciate a certain beauty in their brevity (honest!). One of the great things about the help files is that they all have a very similar structure regardless of the function. This makes it easy to navigate through the file to find exactly what you need.
The first line of the help document contains information such as the name of the function and the package where the function can be found. There are also other headings that provide more specific information such as
- **Description:** gives a brief description of the function and what it does.
- **Usage:** gives the name of the arguments associated with the function and possible default values.
- **Arguments:** provides more detail regarding each argument and what they do.
- **Details:** gives further details of the function if required.
- **Value:** if applicable, gives the type and structure of the object returned by the function or the operator.
- **See Also:** provides information on other help pages with similar or related content.
- **Examples:** gives some examples of using the function. These are really helpful, all you need to do is copy and paste them into the console to see what happens. You can also access examples at any time by using the `example()`\index{example()} function (i.e. `example("mean")`)
The `help()` function is useful if you know the name of the function. If you're not sure of the name, but can remember a key word then you can search R's help system using the `help.search()`\index{help.search()} function.
```{r, echo=TRUE, eval=FALSE}
help.search("mean")
```
Or you can use the equivalent shortcut.
```{r, echo=TRUE, eval=FALSE}
??mean
```
The results of the search will be displayed in RStudio under the 'Help' tab as before. The `help.search()` function searches through the help documentation, code demonstrations and package vignettes and displays the results as clickable links for further exploration.
\
```{r rstudio_help2, echo=FALSE, out.width="50%", fig.align="center"}
knitr::include_graphics(path = "images/rs_help2.png")
```
\
Another useful function is `apropos()`\index{apropos()}. This function can be used to list all functions containing a specified character string. For example, to find all functions with `mean` in their name
```{r, echo=TRUE, eval=TRUE, collapse=TRUE}
apropos("mean")
```
You can then bring up the help file for the relevant function.
```{r, echo=TRUE, eval=FALSE}
help("kmeans")
```
An extremely useful function is `RSiteSearch()`\index{RSiteSearch()} which enables you to search for keywords and phrases in function help pages and vignettes for all CRAN packages, and in CRAN task views. This function allows you to access the https://www.r-project.org/search.html search engine directly from the Console with the results displayed in your web browser.
```{r, echo=TRUE, eval=FALSE}
RSiteSearch("regression")
```
### Other sources of help {#rhelp}
There really has never been a better time to start learning R. There are a plethora of freely available online resources ranging from whole courses to subject specific tutorials and mailing lists. There are also plenty of paid for options if that's your thing but unless you've money to burn there really is no need to part with your hard earned cash. Some resources we have found helpful are listed below.
**General R resources**
- [R-Project][r-docs]: User contributed documentation
- [The R Journal][r-journal]: Journal of the R project for statistical computing
- [Swirl][swirl]: An R package that teaches you R from within R
- [RStudio's printable cheatsheets][rstudio-cheat]
- [Rseek][rseek]: A custom Google search for R-related sites
**Getting help**
- [Google it!][google-cust]: Try Googling any error messages you get. It's not cheating and everyone does it! You'll be surprised how many other people have probably had the same problem and solved it.
- [Stack Overflow][stackr]: There are many thousands of questions relevant to R on Stack Overflow. [Here][stack-pop] are the most popular ones, ranked by vote. Make sure you search for similar questions before asking your own, and make sure you include a [reproducible example][stack-repro] to get the most useful advice. A reproducible example is a minimal example that lets others who are trying to help you to see the error themselves.
**R markdown resources**
- [Basic markdown and R markdown reference][bio-con]
- [A good markdown reference][md-ref]
- [A good 10-minute markdown tutorial][md-tut]
- [RStudio's R markdown cheatsheet][rmd-cheat]
- [R markdown reference sheet][rmd-ref]
- [The R markdown documentation][rs-rm-docs] including a [getting started guide][rm-lesson], a [gallery of demos][rm-gallery], and several [articles][rs-articles] for more advanced usage.
- [The knitr website][knitr] has lots of useful reference material about how knitr works.
**Git and GitHub resources**
- [Happy Git][git_happy]: Great resource for using Git and GitHub
- [Version control with RStudio][rs-Git]: RStudio document for using version control
- [Using Git from RStudio][git-rs]: Good 10 minute guide
- [The R Class][rclass]: In depth guide to using Git and GitHub with RStudio
**R programming**
- [R Programming for Data Science][r-rprog]: In depth guide to R programming
- [R for Data Science][r4ds]: Fantastic book, tidyverse orientated
## Saving stuff in R
Your approach to saving work in R and RStudio depends on what you want to save. Most of the time the only thing you will need to save is the R code in your script(s). Remember your script is a reproducible record of everything you've done so all you need to do is open up your script in a new RStudio session and source it into the R Console and you're back to where you left off.
Unless you've followed our [suggestion](#rsprojs) about changing the default settings for RStudio Projects you will be asked whether you want to save your workspace image every time you exit RStudio. We suggest that 99.9% of the time that you don't want to do this. By starting with a clean RStudio session each time we come back to our analysis we can be sure to avoid any potential conflicts with things we've done in previous sessions.
There are, however, some occasions when saving objects you've created in R is useful. For example, let's say you're creating an object that takes hours (even days) of computational time to generate. It would be extremely inconvenient to have to wait all this time each time you come back to your analysis (although we would suggest exporting this to an external file is a better solution). In this case we can save this object as an external `.RData` file which we can load back into RStudio the next time we want to use it. To save an object to an `.RData` file you can use the `save()`\index{save()} function (notice we don't need to use the assignment operator here)
```{r, echo=TRUE, eval=FALSE, collapse=TRUE}
save(nameOfObject, file = "name_of_file.RData")
```
or if you want to save all of the objects in your workspace into a single `.RData` file use the `save.image()`\index{save.image()} function.
```{r, echo=TRUE, eval=FALSE, collapse=TRUE}
save.image(file = "name_of_file.RData")
```
To load your `.RData` file back into RStudio use the `load()`\index{load()} function.
```{r, echo=TRUE, eval=FALSE, collapse=TRUE}
load(file = "name_of_file.RData")
```
## Exercise 2
```{block2, note-text2, type='rmdtip'}
Congratulations, you've reached the end of Chapter 2! Perhaps now's a good time to practice some of what you've learned. You can find an exercise we've prepared for you (and our solutions) on the course website.
```
```{r links, child="links.md"}
```