-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdocumentation.html
72 lines (72 loc) · 8.6 KB
/
documentation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
<h1>Census data workflow:</h1>
<h1>Downloading census data using the API (census_data_scrape_final)</h1>
<h2>Inputs:</h2>
<pre><code>1. Tahoe_Geometry feature class: This is a feature class with the census tracts, block groups and blocks that fall within the Tahoe Basin for each census geometry year since 1990.
Each feature has a TRPAID which is the census generated GEOID (unique for each census geometry year) combined with the census geometry year.
This TRPAID is unique and is used to join to values in census_demographics and to determine which values should be downloaded from the census api
2. Census Variable Lists: These lists are manually generated by looking at the list of available variables for a given census dataset (https://www.census.gov/data/developers/data-sets.html)
and then adding the variables of interest to a csv in the Demographics\Census_Variable_Lists folder.
Census variable names and category are manually assigned because the Census names tend to be overly complicated and misleading.
The census API limits the number of calls that can be placed so these lists should generally have less than 20-30 variables.
</code></pre>
<h2>Output:</h2>
<pre><code>1. Census value dataframe/xlsx worbook that has the tahoe basin values for the variables in the input csv.
</code></pre>
<h2>Download process:</h2>
<pre><code>1. For each variable in the census variable list data is downloaded for each county within the Tahoe basin. This is necessary because of the way the geography hierarchy is structured in the API call. All of these are combined into one dataframe.
2. Some data wrangling takes place on the downloaded values, including the creation of a TRPAID from the GEOID and the census geometry year.
3. The dataframe is filtered down to only include Tahoe data by doing an inner merge to Tahoe Geometry on TRPAID
4. The output dataframe is then written to an xlsx workbook in Demographics\Census_Data_Downloads where it can be reviewed and then manually appended to census_demographics
</code></pre>
<h1>Calculating medians/sums (Census_Data_Summary)</h1>
<pre><code>1. Sums: Values are read into a dataframe from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable of interest. An additional field is calculated for North and South lake based on county. Values are then summed at the basin, county and north/south levels and output into an xlsx for review.
2. Medians: Values are read in from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable category of interest.
1. Median categories (e.g. 10,000 to 15,000) are converted to upper and lower bin values using a regex function with the lower and upper bins having their bounds defined as 0 and inf respectively.
2. The dataframe is then sorted by the desired levels as well as the variable code (e.g. county and then variable code if county level medians are desired).
3. Cumulative and total sums are calculated for the desired grouping level
4. The median bin for each grouping is identified by finding the first row where the cumulative sum is greater than half the total sum for that grouping
5. Using the difference between the cumulative sum for that bin and the previous bin an interpolation ratio is calculated and that is used to calculate a median value.
</code></pre>
<h2>Assigning different groupings/categories:</h2>
<p>###Census variables are sometimes only available in excessively granular categories from the census (e.g., age groupings are broken down by sex as well) so for convenience and consistency with the census report there’s a process for grouping downloaded data to some commonly used categories.
1. A lookup list for the census variable code and the new category is manually created and placed in Demographics\Census_Category_Lists
2. Values are read in from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable category of interest.
3. Those values are joined to the values in the lookup list on census variable code.
4. This dataframe is then grouped by the new category name from the lookup list with a new variable code name that includes all the census variable code names that were combined to produce the new grouping.
5. This is then manually appended to census_demographics.</p>
<h1>Census data workflow:</h1>
<h1>Downloading census data using the API (census_data_scrape_final)</h1>
<h2>Inputs:</h2>
<pre><code>1. Tahoe_Geometry feature class: This is a feature class with the census tracts, block groups and blocks that fall within the Tahoe Basin for each census geometry year since 1990.
Each feature has a TRPAID which is the census generated GEOID (unique for each census geometry year) combined with the census geometry year.
This TRPAID is unique and is used to join to values in census_demographics and to determine which values should be downloaded from the census api
2. Census Variable Lists: These lists are manually generated by looking at the list of available variables for a given census dataset (https://www.census.gov/data/developers/data-sets.html)
and then adding the variables of interest to a csv in the Demographics\Census_Variable_Lists folder.
Census variable names and category are manually assigned because the Census names tend to be overly complicated and misleading.
The census API limits the number of calls that can be placed so these lists should generally have less than 20-30 variables.
</code></pre>
<h2>Output:</h2>
<pre><code>1. Census value dataframe/xlsx worbook that has the tahoe basin values for the variables in the input csv.
</code></pre>
<h2>Download process:</h2>
<pre><code>1. For each variable in the census variable list data is downloaded for each county within the Tahoe basin. This is necessary because of the way the geography hierarchy is structured in the API call. All of these are combined into one dataframe.
2. Some data wrangling takes place on the downloaded values, including the creation of a TRPAID from the GEOID and the census geometry year.
3. The dataframe is filtered down to only include Tahoe data by doing an inner merge to Tahoe Geometry on TRPAID
4. The output dataframe is then written to an xlsx workbook in Demographics\Census_Data_Downloads where it can be reviewed and then manually appended to census_demographics
</code></pre>
<h1>Calculating medians/sums (Census_Data_Summary)</h1>
<pre><code>1. Sums: Values are read into a dataframe from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable of interest. An additional field is calculated for North and South lake based on county. Values are then summed at the basin, county and north/south levels and output into an xlsx for review.
2. Medians: Values are read in from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable category of interest.
1. Median categories (e.g. 10,000 to 15,000) are converted to upper and lower bin values using a regex function with the lower and upper bins having their bounds defined as 0 and inf respectively.
2. The dataframe is then sorted by the desired levels as well as the variable code (e.g. county and then variable code if county level medians are desired).
3. Cumulative and total sums are calculated for the desired grouping level
4. The median bin for each grouping is identified by finding the first row where the cumulative sum is greater than half the total sum for that grouping
5. Using the difference between the cumulative sum for that bin and the previous bin an interpolation ratio is calculated and that is used to calculate a median value.
</code></pre>
<h2>Assigning different groupings/categories:</h2>
<p>###Census variables are sometimes only available in excessively granular categories from the census (e.g., age groupings are broken down by sex as well) so for convenience and consistency with the census report there’s a process for grouping downloaded data to some commonly used categories.
1. A lookup list for the census variable code and the new category is manually created and placed in Demographics\Census_Category_Lists
2. Values are read in from https://maps.trpa.org/server/rest/services/Demographics/FeatureServer/28 and filtered down to the variable category of interest.
3. Those values are joined to the values in the lookup list on census variable code.
4. This dataframe is then grouped by the new category name from the lookup list with a new variable code name that includes all the census variable code names that were combined to produce the new grouping.
5. This is then manually appended to census_demographics.</p>