-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathStormDatabaseAnalysis.Rmd
140 lines (102 loc) · 5.07 KB
/
StormDatabaseAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Natural events with the greatest consequences across the United States"
author: "Luis Terán"
date: "23/6/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Synapsis
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. The basic goal of this project is to explore the NOAA Storm Database in order to answer the following questions:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The database for the following analysis was obtained from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database from <https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2>.. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Documentation available in <https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf>
## Data Processing
### Data loading
The data is loaded and stored in the dataframe "db"
```{r load}
library(dplyr)
library(ggplot2)
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = 'repdata%2Fdata%2FStormData.csv.bz2')
db <- tbl_df(read.csv("repdata%2Fdata%2FStormData.csv.bz2"))
```
### Injuries and fatalities
The total injuries by each event type are calculated and stored in "dfInj". Also the data is ordered from highest to lowest injueries caused.
```{r}
dfInj<-summarise(group_by(db, EVTYPE), totalInj = sum(INJURIES))
dfInj<-dfInj[order(dfInj$totalInj, decreasing = T),]
head(dfInj)
```
The same process is repeated with the fatalities caused by event and the results are stored in "dfFat".
```{r}
dfFat<-summarise(group_by(db, EVTYPE), totalFat = sum(FATALITIES))
dfFat<-dfFat[order(dfFat$totalFat, decreasing = T),]
head(dfFat)
```
### Crop and property damage in dollars
Now exploring the damage multipliers for both, property and crops, we realize that there are not representing values (-, ?, +) and there are some upper and lower case values.
For property damage:
```{r}
table(db$PROPDMGEXP)
```
For crop damage:
```{r}
table(db$CROPDMGEXP)
```
In order to avoid confusion, all these values are converted to lower case.
```{r}
db$PROPDMGEXP <- tolower(db$PROPDMGEXP)
db$CROPDMGEXP <- tolower(db$CROPDMGEXP)
```
A function is created, it receives a damage exponent and returns the equivalent numeric values for it.
```{r}
multiplier <- function(x) {
if (x == "h") x<-100
if (x == "k") x<-1000
if (x == "m") x<-1000000
if (x == "b") x<-1000000000
x
}
```
A new dataframe that contains the significant values of damage exponent (h, k, m, b) is created.
```{r}
damage<- filter(db, (PROPDMGEXP %in% c("h", "k", "m", "b") & (CROPDMGEXP %in% c("h", "k", "m", "b"))))
damage<-select(damage, c(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))
```
Then, the function "multiplier" is applied for converting all the damage exponents into numbers. And these numbers are multiplied to get the actual damage for each row, in Total Property Damage (tdp) and Total Crop Damage
```{r}
damage$PROPDMGEXP <- sapply(damage$PROPDMGEXP, multiplier)
damage$CROPDMGEXP <- sapply(damage$CROPDMGEXP, multiplier)
damage <- mutate(damage, tpd=PROPDMG*PROPDMGEXP)
damage <- mutate(damage, tcd=CROPDMG*CROPDMGEXP)
```
As we did before, all the data is grouped by event type and the total sum of damage is calculated, for each crop and property damage. After that, Crop and Property damage are merged in one dataframe. The crop damages are added to the property damages in order to get the total damage. Finally, the data is ordered by the amount of total damage.
```{r}
dfCrop<-summarise(group_by(damage, EVTYPE), totalDmg = sum(tcd))
dfProp<-summarise(group_by(damage, EVTYPE), totalDmg = sum(tpd))
dfDamage <- merge(dfCrop,dfProp, by="EVTYPE", suffixes = c("Crop", "Property"))
dfDamage <- mutate(dfDamage, totalDmg=totalDmgCrop+totalDmgProperty)
dfDamage<-dfDamage[order(dfDamage$totalDmg, decreasing = T),]
```
## Results
The event types with the highest number of injuries are:
```{r}
ggplot(head(dfInj,10), aes(x=EVTYPE, y=totalInj))+geom_bar(stat="identity",
fill="steelblue")+coord_flip()+ylab("Total number of injuries")+
xlab("Event type")+ labs(title="Events with the highest injueries caused")
```
The event types with the highest number of fatalities are:
```{r}
ggplot(head(dfFat,10), aes(x=EVTYPE, y=totalFat))+geom_bar(stat="identity",
fill="steelblue")+coord_flip()+ylab("Total number of fatalities")+
xlab("Event type")+ labs(title="Events with the highest fatalities caused")
```
The event types with the highest damages are:
```{r}
ggplot(head(dfDamage,10), aes(x=EVTYPE, y=totalDmg))+geom_bar(stat="identity",
fill="steelblue")+coord_flip()+ylab("Damage caused in dollars")+
xlab("Event type")+ labs(title="Events with the greatest economic consecuences")
```