-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmass-shootings.Rmd
167 lines (135 loc) · 5.38 KB
/
mass-shootings.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "Exploring Mass Shootings in America"
author: "Benjamin Soltoff"
output: github_document
---
## Get the data
```{r, echo = TRUE}
library(tidyverse) # load tidyverse packages, including ggplot2
library(knitr) # load functions for formatting tables
# get data from rcfss package
# install latest version if not already installed
devtools::install_github("uc-cfss/rcfss")
library(rcfss)
# load the data
data("mass_shootings")
mass_shootings
```
## Generate a data frame that summarizes the number of mass shootings per year. Print the data frame as a formatted `kable()` table.
```{r}
case_by_year <- count(mass_shootings, year)
kable(case_by_year, col.names = c("Year", "# of Shootings"))
```
## Generate a bar chart that identifies the number of mass shooters associated with each race category. The bars should be sorted from highest to lowest.
```{r}
ggplot(mass_shootings, aes(x = fct_infreq(race))) +
geom_bar() +
labs(
title = "Number of Mass Shooters per Race",
x = "Race",
y = "Number of Mass Shooters"
)
```
## Generate a boxplot visualizing the number of total victims, by type of location. Redraw the same plot, but remove the Las Vegas Strip massacre from the dataset.
```{r}
ggplot(mass_shootings, aes(x = location_type, y = total_victims)) +
geom_boxplot() +
labs(
title = "Total # of victims by location, with Las Vegas Strip outlier)",
x = "Type of Location",
y = "Total Number of Victims"
)
shootings_noLVS <- filter(mass_shootings, location != "Las Vegas, NV")
ggplot(shootings_noLVS, aes(x = location_type, y = total_victims)) +
geom_boxplot() +
labs(
title = "Total # of victims by location, without outlier",
x = "Type of Location",
y = "Total Number of Victims"
)
```
## How many white males with prior signs of mental illness initiated a mass shooting after 2000?
```{r}
mass_shootings %>%
filter(male == TRUE, prior_mental_illness == 'Yes', race == "White", year > 2000) %>%
count()
```
There were 20 white males with prior signs of mental illnes who initiated a mass shooting after 2000.
## Which month of the year has the most mass shootings? Generate a bar chart sorted in chronological order to provide evidence of your answer.
```{r}
month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",
"Oct", "Nov", "Dec"
)
mass_shootings %>%
mutate(month = factor(month, levels = month_levels)) %>%
ggplot(aes(x = month)) +
geom_bar() +
labs(
title = "# of Mass Shootings per Month of the Year",
x = "Month",
y = "Number of Mass Shootings"
)
```
The months of the year with the most mass shootings are February and June.
## How does the distribution of mass shooting fatalities differ between white and black shooters? What about white and latino shooters?
```{r}
mass_shootings %>%
filter(race == "White" | race == "Black") %>%
ggplot(aes(x = race, y = fatalities)) +
geom_boxplot() +
labs(
title = "Number of Mass Shooting Fatalities for White and Black Shooters",
x = "Race",
y = "# of Fatalities"
)
mass_shootings %>%
filter(race == "White" | race == "Latino") %>%
ggplot(aes(x = race, y = fatalities)) +
geom_boxplot() +
labs(
title = "Number of Mass Shooting Fatalities for White and Latino Shooters",
x = "Race",
y = "# of Fatalities"
)
```
The spread of fatalities caused by White shooters is much bigger than the spread caused by Black and Latino shooters.
## Are mass shootings with shooters suffering from mental illness different from mass shootings with no signs of mental illness in the shooter? Assess the relationship between mental illness and total victims, mental illness and race, and the intersection of all three variables.
```{r}
mass_shootings %>%
filter(!is.na(prior_mental_illness)) %>%
ggplot(aes(x = prior_mental_illness, y = total_victims)) +
geom_boxplot() +
labs(
title = "Number of Total Victims if Prior Mental Illness",
x = "Prior Mental Illness?",
y = "# of Total Victims"
)
mass_shootings %>%
filter(!is.na(prior_mental_illness), !is.na(race)) %>%
ggplot(aes(x = race)) +
geom_bar() +
facet_grid(~prior_mental_illness) +
coord_flip() +
labs(
title = "# Mass shooting incidents per race, divided by prior mental illness",
x = "# of Mass Shooting Incidents",
y = "Race"
)
mass_shootings %>%
filter(!is.na(prior_mental_illness), !is.na(race)) %>%
ggplot(aes(x = race, y = total_victims)) +
geom_boxplot() +
facet_grid(~prior_mental_illness) +
coord_flip() +
labs(
title = "Distribution of total # of victims per race, divided by prior mental illness",
x = "# Total Victims",
y = "Race"
)
```
The first graph, a box plot, shows that the shootings in which the perpetrator had prior mental illness had a higher number and a higher spread of total victims. The second graph, a bar chart, counted the shooting cases per race in which the perpetrator had prior mental illness, then the cases per race without prior mental illness. For all races, the number of shootings after a prior history of mental illness was perceptibly higher. For white perpetrators, the difference is even more noticeable. The third graph, another box plot, shows that the number of total victims is higher when there is history of prior mental illness for all races. Again, the difference is more noticeable among white perpetrators.
## Session info
```{r, echo = TRUE}
devtools::session_info()
```