-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathvisualizeArealData-tutorial.Rmd
187 lines (132 loc) · 9.68 KB
/
visualizeArealData-tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# Thematic Mapping {#thematic-maps}
## Overview {#TM-research-question}
Once we have downloaded the contextual data and generated the access metrics, we can start visualizing them to identify any spatial patterns. This can help identify whether a variable is homogeneously distributed across space or do we see clustering & spatial heterogeneity. In this tutorial we will cover methods to plot data variables spatially i.e. create thematic maps, technically known as **choropleth maps**. We will cover the most commonly used types of choropleth mapping techniques employed in R. Please note the methods covered here are an introduction to spatial plotting. In this tutorial our objectives are to:
* Generate a choropleth map
* Visualize spatial distributions of data
* Test sensitivty of various thresholds
## Environment Setup {#TM-setup}
To replicate the codes & functions illustrated in this tutorial, you’ll need to have R and RStudio downloaded and installed on your system. This tutorial assumes some familiarity with the R programming language.
### Input/Output {#TM-i-o}
We will using the per capita income data for the City of Chicago downloaded & saved as a shapefile in the [Census Data Wrangling](#contextual-data). These files can also be found [here](https://github.com/GeoDaCenter/opioid-environment-toolkit/tree/master/data). Our input is:
- Chicago Zip Codes with per capita income, `chizips_18_pci.shp`
Our output will be three thematic maps highlighting the distribution of per capita income at a zip code level across the city of Chicago.
### Load the packages {#TM-packages2}
We will use the following packages in this tutorial:
- `tidyverse`: to manipulate data
- `tmap`: to visualize and create maps
- `sf`: to read/write and manipulate spatial data
Load the libraries required.
```{r load, warning = FALSE, messages=FALSE}
library(tidyverse)
library(tmap)
library(sf)
```
### Load data {#TM-load-variables}
We will read in the shapefile with the per capita income at the zipcode level for the city of Chicago for year 2018.
``` {r}
chicagoZips_18_PCI <- st_read("data/chizips_18_pci.shp")
head(chicagoZips_18_PCI)
```
Lets review the dataset structure. In the R `sf` data object, the 'geometry' column provides the geographic information/boundaries that we can map. This is unique to simple features data structures, and a pretty phenomenal concept.
We can do a quick plot using:
```{r quick plot, fig.align='center'}
plot(chicagoZips_18_PCI$geometry)
```
## Thematic Plotting {#TM-mapping}
We will be using `tmap` package for plotting spatial data distributions. The package syntax has similarities with `ggplot2` and follows the same idea of *A Layered Grammar of Graphics*.
+ for each input data layer use `tm_shape()`,
+ followed by the method to plot it, e.g `tm_fill() or tm_dots() or tm_line() or tm_borders()` etc.
Similar to ggplot2, aesthetics can be provided for each layer and plot layout can be manipulated using `tm_layout()`. For more details on `tmap` usage & functionality, check [tmap documentation](https://cran.r-project.org/web/packages/tmap/tmap.pdf). The previous map we plotted using `plot` can be mapped using `tmap` as in the code below.
```{r quick plot tmap, fig.align='center'}
tmap_mode('plot')
tm_shape(chicagoZips_18_PCI) + tm_borders() + tm_layout(frame = FALSE)
```
In tmap, the classification scheme is set by the `style` option in `tm_fill()` and the default style is `pretty`. Lets plot the distribution of per capita income by zipcode across the city of Chicago with default style using the code below. We can also change the color palette used to depict the spatial distribution. See [Set Color Palette](#TM-set-palette) in Appendix for more details on that.
```{r plot pretty, fig.align='center'}
tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Pretty') +
tm_borders() +
tm_layout(frame = FALSE, legend.outside = TRUE, legend.outside.position = 'right',
legend.title.size = 0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
```
We will be plotting the spatial distribution of variable perCapIncome for the city of Chicago using three methods.
1. Quantile
2. Natural Breaks
3. Standard Deviation
For a more detailed overview of choropleth mapping and methods, check out the related [GeoDa Center Documentation](https://geodacenter.github.io/workbook/3a_mapping/lab3a.html#thematic-maps-overview).
### Quantile {#TM-quantile-plot}
A quantile map is based on sorted values for the variable that are then grouped into bins such that each bin has the same number of observations. It is obtained by setting `style = 'quantile'` and `n = no of bins` arguments in tm_fill().
```{r plot quantile, message=FALSE, fig.align='center'}
p1 <- tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Quantile',
style = 'quantile', n = 5) + tm_borders() +
tm_layout(frame = FALSE,legend.outside = TRUE,
legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
#tmap_save(p1, 'PctHisp_18_Quantile.png') # save the map in a .png file
p1
```
### Natural Breaks {#TM-jenks-plot}
Natural breaks or jenks distribution uses a nonlinear algorithm to cluster data into groups such that the intra-bin similarity is maximized and inter-bin dissimilarity is minimized. It is obtained by setting `style = 'jenks'` and `n = no. of bins` in the tm_fill().
As we can see, jenks method better classifies the dataset in review than the quantile distribution. There is no correct method to use and the choice of classification method is dependent on the problem & dataset used.
```{r plot jenks, message=FALSE, fig.align='center'}
p2 <- tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Jenks',
style = 'jenks', n = 5) + tm_borders() +
tm_layout(frame = FALSE,legend.outside = TRUE,
legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
#tmap_save(p2, 'PctHisp_18_Jenks.png')# save the map in a .png file
p2
```
### Standard Deviation {#TM-stdev-plot}
A standard deviation map normalizes the dataset (mean = 0, stdev = 1) and transforms it into units of stdev (given mean =0). It helps identify outliers in the dataset. It is obtained by setting `style = 'sd'` in the tm_fill().
The normalization process can create bins with negative values, which in this case don't necessarily make sense for the dataset, but it still helps identify the outliers.
```{r plot stdev, message=FALSE, fig.align='center'}
p3 <- tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Stdev',
style = 'sd') + tm_borders() +
tm_layout(frame = FALSE, legend.outside = TRUE,
legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
#tmap_save(p3, 'PctHisp_18_Stdev.png')# save the map in a .png file
p3
```
## Appendix {#TM-appendix}
### Set Color Palette {-#TM-set-palette}
The range of colors used to depict the distribution in the map can be set by modifying the `palette` argument in tm_fill(). For example, we can use `Blues` palette to create the map below.
```{r plot palette, fig.align='center'}
tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Jenks',
style = 'jenks', n = 5, palette = 'Blues') + tm_borders() +
tm_layout(frame = FALSE,legend.outside = TRUE,
legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
```
### Use ColorBrewer {-#TM-colorbrewer}
To build aesthetically pleasing and easy-to-read maps, we recommend using color palette schemes recommended in [ColorBrewer 2.0](https://colorbrewer2.org/#) developed by Cynthia Brewer. The website distinguishes between sequential(ordered), diverging(spread around a center) & qualitative(categorical) data. Information on these palettes cab be displayed in R using RColorBrewer package.
We can get the hex values for the colors used in a specific palette with n bins & plot the corresponding colors using code below.
```{r colorbrewer, message = FALSE}
require(RColorBrewer)
RColorBrewer::brewer.pal(5,"PuBuGn")
RColorBrewer::display.brewer.pal(5,"PuBuGn")
```
We can update the jenks map by using this sequential color scheme and changing the transparency using `alpha = 0.8` as below.
```{r plot jenks improved, fig.align='center'}
tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Jenks',
style = 'jenks', n = 5, palette = 'PuBuGn') + tm_borders() +
tm_layout(frame = FALSE,
legend.outside = TRUE, legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
```
We can also update the stdev map by using a diverging color scheme as below.
```{r plot stdev improved, fig.align='center'}
tm_shape(chicagoZips_18_PCI) + tm_fill('prCptIn', title = 'Per Capita Income - Stdev',
style = 'sd', palette = '-RdBu', alpha = 0.9) + tm_borders() +
tm_layout(frame = FALSE,
legend.outside = TRUE, legend.outside.position = 'right', legend.title.size =0.9,
main.title = 'Per Capita Income, by Zipcode, Chicago 2018',
main.title.size = 0.9)
```