-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsession1.Rmd
273 lines (173 loc) · 8.61 KB
/
session1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
title: "Getting the right tools for the job"
subtitle: "R, RStudio, git, and GitHub"
author:
- "Ernest Guevarra"
date: '11 October 2024'
output:
xaringan::moon_reader:
css: xaringan-themer.css
nature:
slideNumberFormat: "%current%"
highlightStyle: github
highlightLines: true
ratio: 16:9
countIncrementalSlides: true
---
```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(
fig.width=9, fig.height=3.5, fig.retina=3,
out.width = "100%",
cache = FALSE,
echo = TRUE,
message = FALSE,
warning = FALSE,
hiline = TRUE
)
if (!require(remotes)) install.packages("remotes")
if (!require(fontawesome)) remotes::install_github("rstudio/fontawesome")
```
```{r xaringan-themer, include=FALSE, warning=FALSE}
library(xaringanthemer)
style_mono_light(
base_color = "#002147",
title_slide_background_image = "",
title_slide_background_size = "cover",
header_font_google = google_font("Fira Sans"),
text_font_google = google_font("Fira Sans Condensed"),
text_font_size = "1.2em",
link_color = "#214700",
header_h1_font_size = "50px",
header_h2_font_size = "40px",
header_h3_font_size = "30px",
code_font_google = google_font("Fira Mono"),
text_slide_number_font_size = "0.5em",
footnote_font_size = "0.5em"
)
```
# Outline
1. What is R?
2. Why use R?
3. What is RStudio?
4. Why use RStudio?
5. What is Git and GitHub?
6. Why use Git and GitHub?
---
# What is R?
.pull-left[
* `R` is a simple but powerful *programming language*
* `R` is a system for *data manipulation*, *calculation*, and *graphics*. It provides:
* Facilities/functions for data handling and storage;
* A large collection of tools for data analysis; and,
* Graphical facilities for data analysis and display.
* `R` is a programming language that provides statistical functions as part of a broader programming tool.
]
.pull-right[
.center[![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/724px-R_logo.svg.png)]
]
???
R is not a statistical package/software as such in the same way as STATA or SPSS.
R is a general programming language that has a sophisticated, well-developed, and well-maintained library of statistical tools and functions.
---
# Why use R?
.pull-left[
.center[![](https://user2021.r-project.org/img/artwork/user-logo-color.png)]
]
.pull-right[
* It is an ***open source system*** and is available for ***free***. Even though free, R is ***more powerful than most commercial packages***.
* Considerably ***more flexible than statistical packages*** that relies on menus, buttons, and boxes.
* Every stage of your data management and analysis can be recorded and edited and re-run at a later date.
* huge user and developer community.
* has a robust set of user- and community-developed packages that support statistical methods and techniques.
]
---
# What is RStudio?
.pull-left[
* An ***integrated development environment (IDE)*** for R. RStudio is not R. RStudio is a tool for interfacing with R.
* Includes a ***console***, ***syntax-highlighting editor*** that supports direct code execution, as well as ***tools for plotting, history, debugging and workspace management***.
]
.pull-right[
.center[![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/RStudio_logo_flat.svg/1920px-RStudio_logo_flat.svg.png?20190314020554)]
]
---
# Why use RStudio?
* RStudio is designed to make it easier to work with R
* RStudio facilitates creation of project-orientated workflows
* RStudio makes it convenient to view and interact with the objects in your environment
---
# What is git?
.pull-left[
* Free and open source distributed **version control system**
* Built for software development for a group of developers to work collaboratively and to manage the evolution of a set of files
- like *"Track Changes"* in Microsoft Word on steroids!
* Has been re-purposed to manage a collection of files that make up a typical data analytical project that consists of data, figures, reports, and source code
]
.pull-right[
[.center[![](https://git-scm.com/images/logos/downloads/Git-Logo-2Color.png)]](https://git-scm.com)
]
???
Git is a version control system. Its original purpose was to help groups of developers work collaboratively on big software projects. Git manages the evolution of a set of files – called a repository – in a sane, highly structured way. Think of it as the “Track Changes” features from Microsoft Word on steroids.
Git has been re-purposed by the data science community. In addition to using it for source code, it has been used to manage the motley collection of files that make up typical data analytical projects - data, figures, reports, and, source code.
---
# Why use git?
.pull-left[
.center[![](images/phdcomics-filenames.gif)]
]
.pull-right[
### Version control
* Is the only reasonable and sane way to keep track changes in source code, manuscripts, presentations, and data analysis projects
* Documentation of differences between versions
* Exploration of differences between versions
]
???
Version control is the only reasonable way to keep track of changes in code, manuscripts, presentations, and data analysis projects. We are all used to or familiar with making numbered filenames for a project. But exploring the differences is difficult, to say the least. git in its very essence a powerful version control system. When used properly, the documentation of each small change across all your files is facilitated and made easier through git. And this documentation makes exploration of differences in versions easier and intelligible.
---
# Why use git?
.pull-left[
.center[![](images/why_repro_research1.gif)]
]
.pull-right[
### Communication and collaboration
* **Communicating** one's research project with other people is part of the scientific process - not just results but the whole process
* **Collaborating** with others on each other's research project allows us to build on each other's past work, using them for a different context/problem, or re-purposing them to come up with a new approach/solution
* Communication and collaboration on various aspects of the scientific process is facilitated by using git
]
???
Merging collaborators’ changes made easy. Have you ever had to deal with a collaborator sending you modifications distributed across many files, or had to deal with two people having made changes to the same file at the same time? Painful. git merge is the answer.
---
# What is GitHub and Why use GitHub?
.pull-left[
[.center[![](https://avatars.githubusercontent.com/u/583231?v=4)]](https://github.com)
]
.pull-right[
* Service provider of hosting for software development and version control using git
* Offers distributed version control and source code management functionality of git, plus its own features such as bug tracking, feature request, task management, continuous integration and wikis for every project
* Like *facebook* but for programmers
* Facilitates *"openness"* of **Open Source**
* Lowers the barriers to collaboration
]
???
Github is like facebook for programmers. Everyone’s on there. You can look at what they’re working on and easily peruse their code and make suggestions or changes.
It’s really open source. “Open source” is not so open if you can’t easily study it. With github, all of the code is easily inspected, as is its entire history.
Github lowers the barriers to collaboration. It’s easy to offer suggested changes to others’ code through github.
You don’t have to set up a git server. It’s surprisingly easy to get things set up.
---
background-color: #FFFFFF
background-image: url(images/pcbi.1004947.g001.jpg)
background-size: 50%
# git and GitHub
.footnote[Taken from Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS computational biology, 12(7), e1004947. https://doi.org/10.1371/journal.pcbi.1004947]
---
class: inverse, center, middle
# Questions?
---
class: inverse, center, middle
# Practical session
## Basic git operations with RStudio for retrieving and submitting assignments via GitHub Classroom
---
class: inverse, center, middle
# Thank you!
Slides can be viewed at https://oxford-ihtm.io/open-reproducible-science/session1.html
PDF version of slides can be downloaded at https://oxford-ihtm.io/open-reproducible-science/pdf/session1-getting-the-right-tools.pdf
R scripts for slides available [here](https://github.com/OxfordIHTM/open-reproducible-science/blob/main/session1.Rmd)