forked from laser-institute/laser-orientation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlaser-orientation-badge.rmd
executable file
·471 lines (351 loc) · 16.4 KB
/
laser-orientation-badge.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
---
title: "LASER Badge"
subtitle: "LASER Institute Orientation"
author: "Mark LaVenia"
date: "`r format(Sys.Date(),'%B %e, %Y')`"
output:
html_document:
toc: yes
toc_depth: 4
toc_float: yes
code_folding: show
code_download: TRUE
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE)
```
![](img/LASER_Hx.png){width="40%"}
## Introduction
Welcome to your first LASER badge! This LASER Orientation Badge is
really a warm-up activity to introduce you to R Markdown and the coding
case studies that we will be using in the machine learning, network
analysis, and text mining labs. It is a chance to become familiar with
how RStudio and R Markdown works.
You may have used R before-or you may not have! Either is fine as this
task will be designed with the assumption that you have not used R
before. It includes "reaches" for anyone who may want to do a bit more.
In the context of doing so, we'll focus on the following tasks:
1. Reading data into R (in the **Prepare** section)
2. Preparing and "wrangling" data in table (think spreadsheet!) format
(in the **Wrangle** section)
3. Creating some plots (in the **Explore** section)
4. Running a model - specifically, a regression model (in the **Model**
section)
5. Finally, creating a reproducible report of your work you can share
with others (in the **Communicate** section)
### The LASER Cycle
You may be wondering what these bolded terms above refer to; what's so
special about preparing, wrangling, exploring, and modeling data - and
communicating results? We're using these terms as a part of a framework,
or model, for what we mean by doing learning in STEM education research.
The particular framework we are using comes from the work of Krumm et
al.'s [*Learning Analytics Goes to
School*](https://github.com/laser-institute/essential-readings/blob/main/laser-orientation/Learning%20Analytics%20Goes%20to%20School.pdf)*.*
You can check that out, but don't feel any need to dive deep for now -
we'll be spending more time on this in first day of the summer
institute. For now, know that this document is organized around three of
the five components of what we're referring to as the **LASER cycle**.
Click the green arrow to the right of the "code chunk" below to view the
image (more on that process of clicking the green arrow and what it
does, too, in a moment)!
```{r}
knitr::include_graphics("img/laser-cycle.png")
```
### How to use this R Markdown document
This is an R Markdown file as indicated by the .rmd extension at the end
of the file name. R Markdown documents are fully reproducible and use a
productive [notebook
interface](https://bookdown.org/yihui/rmarkdown/notebook.html) to
combine narrative text and "chunks" of code to produce a range of
formatted outputs including: formats
including [HTML](https://bookdown.org/yihui/rmarkdown/html-document.html), [PDF](https://bookdown.org/yihui/rmarkdown/pdf-document.html), [MS
Word](https://bookdown.org/yihui/rmarkdown/word-document.html), [Beamer](https://bookdown.org/yihui/rmarkdown/beamer-presentation.html), [HTML5
slides](https://bookdown.org/yihui/rmarkdown/ioslides-presentation.html), [Tufte-style
handouts](https://bookdown.org/yihui/rmarkdown/tufte-handouts.html), [books](https://bookdown.org/), [dashboards](https://rmarkdown.rstudio.com/flexdashboard/), [shiny
applications](https://bookdown.org/yihui/rmarkdown/shiny-documents.html), [scientific
articles](https://github.com/rstudio/rticles), [websites](https://bookdown.org/yihui/rmarkdown/rmarkdown-site.html),
and more.
There are two keys to your use of R Markdown for this activity:
1. First, be sure that you are viewing the document in the "Visual
Editor" mode. You can use this mode by clicking the word "Visual" on
the left side of the toolbar above.
2. Second, click "Knit" next to the yarn ball at the top of this screen
to preview the document as you work through it. This will allow you
to see your code and the input in a rendered - easy-to-read -
document, just as others will see this document when shared. Try
knitting the document now and see what happens.
Let's get started! We are glad you are here and to begin this exciting
(and challenging) journey together.
## 1. PREPARE
By preparing, we refer to developing a question or purpose for the
analysis, which you likely know from your research can be difficult!
This part of the process also involves developing an understanding of
the data and what you may need to analyze the data. This often involves
looking at the data and its documentation. For now, we'll focus on just
a few parts of this process, diving in much more deeply over the coming
weeks.
### Packages 📦
R uses "packages," add-ons that enhance its functionality. One package
that we'll be using is the tidyverse. To load the tidyverse, click the
green arrow in the right corner of the block-or "chunk"-of code that
follows.
```{r}
library(tidyverse)
```
Please do not worry if you saw a number of messages: those probably mean
that the tidyverse loaded just fine. If you see an error, though, try to
interpret or search via your search engine the contents of the error, or
reach out to us for assistance.
### Loading (or reading in) data
Next, we'll load data-specifically, a CSV file, the kind that you can
export from Microsoft Excel or Google Sheets - into R, using the
`read_csv()` function in the next chunk.
Clicking the green arrow runs the code; do that next to read the
`sci-online-classes.csv` file stored in your data folder into your R
environment:
```{r}
d <- read_csv("data/sci-online-classes.csv")
```
Nice work! You should now see a new data "object" named `d` saved in
your Environment pane. Try clicking on it and see what happens.
#### Viewing or inspecting data
Now let's learn another way to inspect our data. Run the next chunk and
look at the results, tabbing left or right with the arrows, or scanning
through the rows by clicking the numbers at the bottom of the pane with
the print-out of the data you loaded:
```{r}
d
```
#### [**Your Turn**]{style="color: green;"} **⤵**
What do you notice about this data set? What do you wonder? Add one-two
thoughts following the dashes next (you can add additional dashes if you
like!):
- 603 obs. of 30 variables
- Student-level data, including student id, academic performance at the item level and summary info, context info, and student characteristics
There are other ways to inspect your data; the `glimpse()` function
provides one such way. Run the code below to take a glimpse at your
data.
```{r}
glimpse(d)
```
We have one more question to pose to you: What do rows and columns
typically represent in your area of work and/or research?
Generally, rows typically represent "cases," the units that we measure,
or the units on which we collect data. This is not a trick question!
What counts as a "case" (and therefore what is represented as a row)
varies by (and within) fields. There may be multiple types or levels of
units studied in your field; listing more than one is fine! Also, please
consider what columns - which usually represent variables - represent in
your area of work and/or research.
#### [**Your Turn**]{style="color: green;"} **⤵**
What rows typically (or you think may) represent:
- rows typically represent individual people, though it can be structured long, with information on an individual person (such as items on a test or points in time) represented as individual rows.
What columns typically (or you think may) represent:
- Columns typically represent information on an individual person, including context info, such as organization and geographic area to which they are associated.
Next, we'll use a few functions that are handy for preparing data in
table form.
## 2. WRANGLE
By wrangle, we refer to the process of cleaning and processing data,
and, in cases, merging (or joining) data from multiple sources. Often,
this part of the process is very (surprisingly) time-intensive.
Wrangling your data into shape can itself be an important
accomplishment! There are great tools in R to do this, especially
through the use of the {dplyr} R package.
### Selecting variables
Let's select only a few variables.
```{r}
d |>
select(student_id, total_points_possible, total_points_earned)
```
Notice how the number of columns (variables) is now different.
Let's *include one additional variable* in your select function.
First, we need to figure out what variables exist in our dataset (or be
reminded of this - it's very common in R to be continually checking and
inspecting your data)!
You can use a function named glimpse() to do this.
```{r}
glimpse(d)
```
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, add a new variable to the code below, being
careful to type the new variable name as it appears in the data. We've
added some code to get you started. Consider how the names of the other
variables are separated as you think about how to add an additional
variable to this code.
```{r}
d |>
select(student_id, course_id, total_points_possible, total_points_earned)
```
Once added, the output should be different than in the code above -
there should now be an additional variable included in the print-out.
### Filtering variables
Next, let's explore filtering variables. Check out and run the next
chunk of code, imagining that we wish to filter our data to view only
the rows associated with students who earned a final grade (as a
percentage) of 70 - 70% - or higher.
```{r}
d |>
filter(FinalGradeCEMS > 70)
```
##### [**Your Turn**]{style="color: green;"} **⤵**
In the next code chunk, change the cut-off from 70% to some other
value - larger or smaller (maybe much larger or smaller - feel free to
play around with the code a bit!).
```{r}
d |>
filter(FinalGradeCEMS > 99)
```
What happens when you change the cut-off from 70 to something else? Add
a thought (or more):
- I changed the cutoff to > 99, which drastically reduced the number of cases, to now only 5 obs.
### Arrange
The last function we'll use for preparing tables is arrange.
We'll combine this arrange() function with a function we used already -
select(). We do this so we can view only the student ID and their final
grade.
```{r}
d |>
select(student_id, FinalGradeCEMS) |>
arrange(FinalGradeCEMS)
```
Note that arrange works by sorting values in ascending order (from
lowest to highest); you can change this by using the desc() function
with arrange, like the following:
```{r}
d |>
select(student_id, FinalGradeCEMS) |>
arrange(desc(FinalGradeCEMS))
```
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, replace FinalGradeCEMS that is used with both
the select() and arrange() functions with a different variable in the
data set. Consider returning to the code chunk above in which you
glimpsed at the names of all of the variables.
```{r}
d |>
select(student_id, FinalGradeCEMS, Points_Earned) |>
arrange(desc(Points_Earned))
```
### Reach 1 🎉
Can you compose a series of functions that include the select(),
filter(), and arrange functions? Recall that you can "pipe" the output
from one function to the next as when we used select() and arrange()
together in the code chunk above.
*This reach is not required/necessary to complete; it's just for those
who wish to do a bit more with these functions at this time (we'll do
more in class, too!)*
```{r}
glimpse(d)
d |>
select(student_id, TimeSpent, Points_Earned) |>
arrange(desc(Points_Earned), desc(TimeSpent))
```
## 3. EXPLORE
Exploratory data analysis, or exploring your data, involves processes of
*describing* your data (such as by calculating the means and standard
deviations of numeric variables, or counting the frequency of
categorical variables) and, often, visualizing your data prior. In this
section, we'll create a few plots to explore our data.
### Histogram
The code below creates a histogram, or a distribution of the values, in
this case for students' final grades.
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram()
```
You can change the color of the histogram bars by specifying a color as
follows:
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram(fill = "blue")
```
### Changing colors
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, change the color to one of your choosing;
consider this list of valid color names here:
<http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf>
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram(fill = "darkviolet")
```
Finally, we'll make one more change; visualize the distribution of
another variable in the data - one other than FinalGradeCEMS. You can do
so by swapping out the name for another variable with FinalGradeCEMS.
Also, change the color to one other than blue.
```{r}
glimpse(d)
ggplot(d, aes(x = Points_Earned)) +
geom_histogram(fill = "darkviolet")
```
### Reach 2 🎉
Completed the above? Nice job! Try for a "reach" by creating a scatter
plot for the relationship between two variables. You will need to pass
the names of two variables to the code below for what is now simply XXX
(a placeholder).
```{r}
ggplot(d, aes(x = TimeSpent, y = Points_Earned)) +
geom_point()
```
## 4. MODEL
"Model" is one of those terms that has many different meanings. For our
purpose, we refer to the process of simplifying and summarizing our
data. Thus, models can take many forms; calculating means represents a
legitimate form of modeling data, as does estimating more complex
models, including linear regressions, and models and algorithms
associated with machine learning tasks. For now, we'll run a linear
regression to predict students' final grades.
Below, we predict students' final grades (`FinaGradeCEMS`, which is on a
0-100 point scale) on the basis of the time they spent on the course
(measured through their learning management system in minutes,
`TimeSpent`, and the subject (one of five) of their specific course.
```{r}
m1 <- lm(FinalGradeCEMS ~ TimeSpent + subject, data = d)
summary(m1)
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Notice how above the variables are separated by a + symbol. Below, add
*another* - a third - variable to the regression model. Specifically,
add a variable students' initial, self-reported interest in science,
`int` - and any other variable(s) you like! What do you notice about the
results?
- I added semester to the model. With S116 as the reference category, there was statistically signficiant ( p < .05) difference in final grades for students in Semester S216, at an estimated -4 points.
We're going to dive into this *much* more: if you have many
questions now, you're in the right spot!
```{r}
table(d$semester, useNA = "ifany")
m2 <- lm(FinalGradeCEMS ~ TimeSpent + subject + semester, data = d)
summary(m2)
```
## 5. COMMUNICATE
Great job! Once you've finished your work, Upon doing so, you should see
a new `laser-orientation-badge.html`.
Congratulations, you've completed your Models & Inference Badge!
Complete the following steps to submit your work for review by
1. Change the name of the `author:` in the [YAML
header](https://monashdatafluency.github.io/r-rep-res/yaml-header.html)
at the very top of this document to your name. As noted in
[Reproducible Research in
R](https://monashdatafluency.github.io/r-rep-res/index.html), The
YAML header controls the style and feel for knitted document but
doesn't actually display in the final output.
2. Click the yarn icon above to "knit" your data product to a
[HTML](https://bookdown.org/yihui/rmarkdown/html-document.html) file
that will be saved in your R Project folder.
3. Commit your changes in GitHub Desktop and push them to your online
GitHub repository.
4. Publish your HTML page the web using one of the following
[publishing methods](https://rpubs.com/cathydatascience/518692):
- Publish on [RPubs](https://rpubs.com) by clicking the "Publish"
button located in the Viewer Pane when you knit your document.
Note, you will need to quickly create a RPubs account.
- Publishing on GitHub using either [GitHub
Pages](https://pages.github.com) or the [HTML
previewer](http://htmlpreview.github.io).
5. Post a new discussion on GitHub to our [Foundations Badges
forum](https://github.com/orgs/laser-institute/teams/foundations/discussions/2).
In your post, include a link to your published web page and a short
reflection highlighting one thing you learned from this lab and one
thing you'd like to explore further.