-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathweather.html
450 lines (411 loc) · 18.4 KB
/
weather.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
<!DOCTYPE HTML>
<!--
Alpha by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-156016776-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() { dataLayer.push(arguments); }
gtag('js', new Date());
gtag('config', 'UA-156016776-1');
</script>
<title>Weather Impact Study</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
<link rel="shortcut icon" type="image/png" href="images\flavicon\changa32x32royalpurple-iceflow.png" />
</head>
<body class="is-preload">
<div id="page-wrapper">
<!-- Header -->
<header id="header">
<h1><a href="index.html">Portfolio Website</a> by Michael Davis</h1>
<nav id="nav">
<ul>
<li><a href="index.html" class="button">Home</a></li>
<li>
<a href="#" class="icon solid fa-angle-down">Additional Information</a>
<ul>
<li><a href="aboutme.html">About Me</a></li>
<li><a href="resume.html">Resume</a></li>
<li><a href="contact.html">Contact</a></li>
<li>
<a href="#">Projects</a>
<ul>
<li><a href="politicsandcovid19.html">Politics & COVID-19 Tableau Viz</a></li>
<li><a href="2020election.html">2020 Election Tableau Viz</a></li>
<li><a href="honeybeeviz.html">Honey Bee Tableau Viz</a></li>
<li><a href="earthquake.html">Earthquake Data Viz</a></li>
<li><a href="belly_button.html">Biodiversity Dashboard</a></li>
<li><a href="weather.html">Weather Impact Study</a></li>
</ul>
</li>
</ul>
</li>
<!-- <li><a href="#" class="button">Sign Up</a></li> -->
</ul>
</nav>
</header>
<!-- Main -->
<section id="main" class="container">
<header>
<h2>Does Weather in Indiana Impact
<br>If & When a Home Is Sold?
</h2>
<p> A team project that explores the changes in Hoosier weather and its impact
on the residential real estate market
</p>
</header>
<div class="box">
<span class="image featured"><img src="images/weather/ETL_image.png" alt="an image" /></span>
<hr class="divider">
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Aims</h3>
<ul>
<li style="margin-left:2em">AIM 1: Evaluate existing data from the Indiana
Property Disclosure datasets</li>
<li style="margin-left:2em">AIM 2: Analyze data for weather, home sales, and
Census data to find correlations within each dataset</li>
<li style="margin-left:2em">AIM 3: Design tables and utilize Open Weather Map
(API), Kaggle (CSV), U.S. Census (API)</li>
<li style="margin-left:2em">AIM 4: Utilize Analysis tools such as: Json |
Python | CSV | pgAdmin 4</li>
<li style="margin-left:2em">AIM 5: Provide data on if weather impacts the
home sales in Indiana</li>
</ul>
</p>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Impact and Intended Results</h3>
<p> The Hoosier State enjoys all four seasons of the year. No
matter what time of year it is, you’ll find something fun to do outdoors
thanks to our Midwestern location. In recent years, Indiana has been listed as one of the top
growth states in terms of USA population. Since Indiana experiences all four
seasons, the question to explore is how weather impacts homesales.
<br><br>
Our goal is to provide the impact of home sales in Indiana based on weather
utilizing the CSV from Kaggle.
<br><br>
We will look at dates, temperature changes, conditions, buyer location per county,
sale prices, population count, household income, and median home value.
<br><br>
Our goal is to provide the impact of home sales in Indiana based on weather.
We will look at dates, temperature changes, conditions, buyer location per county,
sale prices, population count, household income and median home value.
<p>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Average Temperatures</h3>
<p> As you might guess, January is typically the coldest month and July the hottest.
Here are average temperatures for each month.
<br><br>
<span class="image featured_60"><img src="images/weather/Average_Temperature.png"
alt="centered image" /></span>
<p>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Monthly Precipitation</h3>
<p> While kids do get to enjoy snow days in Indiana, the summer months bring the most
precipitation. Check out the monthly averages.
<br><br>
<span class="image featured_60"><img src="images/weather/Monthly_Precipitaion.png"
alt="centered image" /></span>
<p>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>The Plan</h3>
<h3>Steps to Extract and Transform data from the U.S. Census</h3>
<p> The extracted housing related data source was the U.S. Census and was acquired
using an API. The specific data set selected was from the 2018 American Community
Survey 5-Year Survey (U.S. Census ACS) and the data points we selected came from the
following table: U.S. Census Variables
<br><br>
The five-year ACS was selected since it has the most complete housing data. From
the table containing approximately 20,000 variables, we selected:
<ul>
<li style="margin-left:2em">Median Home Value</li>
<li style="margin-left:2em">Household Income</li>
<li style="margin-left:2em">Population</li>
<li style="margin-left:2em">Median Age</li>
<li style="margin-left:2em">Per Capita Income</li>
<li style="margin-left:2em">State</li>
<li style="margin-left:2em">County</li>
</ul>
The state and county fields were useful in joining the Census table to the other
tables. Once the data was called utilizing the Census API, then the data was put
into a pandas dataframe.
<br><br>
<span class="image featured_60"><img src="images/weather/ACS_Screen_Shot.png"
alt="centered image" /></span>
<br><br>
<ul>
<li style="margin-left:2em">Transforming the U.S. Census housing related data
</li>
<li style="margin-left:2em">Fields were converted to integer using .astype.</li>
<li style="margin-left:2em">The first dataframe, us_census_df, called pulled
data from all 50 states. We created a second dataframe, IN_census, and verified
the data contained 92 rows - one for each of Indiana's 92 counties.</li>
<li style="margin-left:2em">The county and state fields were pulled as FIPS
codes and the "us" dependency, the FIPS codes were converted to state
and county names.</li>
<li style="margin-left:2em">Columns were re-sorted for a more natural looking
arrangement.</li>
<li style="margin-left:2em">The final dataframe, IN_census_3, was also set to a
csv file called IN_census_file.csv.</li>
</ul>
<p>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Steps to Extract and Transform the Home Price Data:</h3>
<h3>Extract</h3>
<ul>
<li style="margin-left:2em">The first step was to read in the original CSV
file. This data set came from Kaggle and was s titled: Indiana - Property
Sales Disclosure. This file is 2GB.</li>
<li style="margin-left:2em">The file was downloaded directly from the website.</li>
</ul>
<h3>Transform</h3>
<ul>
<li style="margin-left:2em">This dataset contains 87 columns, including data
on the buyer, seller, sales price, lot description, tax information, and sales date.</li>
<li style="margin-left:2em">The following columns were selected:</li>
<ul>
<li style="margin-left:3em">Parcel1_Acreage</li>
<li style="margin-left:3em">PropStreet</li>
<li style="margin-left:3em">PropCity</li>
<li style="margin-left:3em">PropState</li>
<li style="margin-left:3em">PropZip</li>
<li style="margin-left:3em">C6_Sales_price_Assessor</li>
<li style="margin-left:3em">Buyer1Street</li>
<li style="margin-left:3em">Buyer1City</li>
<li style="margin-left:3em">Buyer1State</li>
<li style="margin-left:3em">Buyer1ZIP</li>
<li style="margin-left:3em">Conveyance_Date</li>
</ul>
<li style="margin-left:2em">A .dropna() was performed on the data to drop
any rows that contain an NaN.</li>
<li style="margin-left:2em">There were many inconsistencies in the Property
City column, including misspellings and differences in the cases of the letters.</li>
<li style="margin-left:2em">We obtained weather data for 107 Indiana cities.
Only data for those cities was included in the final home price dataset.
To do this, the city list was copied over from the weather Jupyter notebook.
Then the cities needed to be converted to upper case letters to match up
with the city entries in the home price dataset. A for loop was setup to
include only those 107 cities.</li>
<li style="margin-left:2em">Columns were renamed to be more descriptive.
New column names: - Acreage - Property Street - Property City - Property
State - Property Zip - Sale Price - Buyer Street - Buyer City - Buyer State -
Buyer Zip - Conveyance Date</li>
<li style="margin-left:2em">We discovered 1,934 duplicates on home_prices.csv file.</li>
<li style="margin-left:2em">Duplicate data was dropped using: .drop_duplicates().</li>
<li style="margin-left:2em">Finally, the cleaned file was saved as a csv.
</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Steps to Extract and Transform the Weather Data:</h3>
<h3> Extract</h3>
<ul>
<li style="margin-left:2em">Our original plan was to get data for the
specific location and date of each home sale using an API call from
Open Weather Map. Historical data was unavailable without a paid
subscription. This was true of many other weather websites as well.</li>
<li style="margin-left:2em">Weather data was scraped from the following
website:
<a
href="https://www.usclimatedata.com/climate/indiana/united-states/3184">U.S.
Climate Data</a>
</li>
<li style="margin-left:2em">ChromeDriver was used to collect html data
from the website and BeautifulSoup was used to extract the parts of the
data that we needed.
</li>
<li style="margin-left:2em">The first piece of information needed was the URL endpoint for
each city to obtain weather data from the tables. This was done by scraping the endpoints.</li>
<li style="margin-left:2em">An empty dataframe titled average_monthly
was created to hold the scraped data. A for loop was created to cycle through the URLs and
the pandas function .read_html() was used to collect the data from the tables on the
page. A column was added to include the name of the city. This was done by splitting the URL
string using the pandas function .split(“/”), and then indexing the component containing the
name of the city.</li>
</ul>
<span class="image featured_60"><img
src="images/weather/US_Climate_Data.png"
alt="centered image" /></span>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Transform</h3>
<ul>
<li style="margin-left:2em">The columns were transposed and then the
files merged back together so the averaged weather data became columns.</li>
<li style="margin-left:2em">A County column was added so that the
census data and the weather table were joined. The home price data was
joined via the city name.
</li>
<li style="margin-left:2em">Please note that in this dataframe some
“NaN” values were intentionally
left in. Data is present for all cities for the average high and
low temperatures, but snowfall
data is only available for 64 of the 107 cities.</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Notes</h3>
<ul>
<li style="margin-left:2em">The final csv file for the home
price data: home_price_final.csv</li>
<li style="margin-left:2em">The final csv file for the weather
data: average_indiana_weather.csv</li>
<li style="margin-left:2em">All work to extract and transform
the home price data is in: Home_Price_Data.ipynb</li>
<li style="margin-left:2em">All work to extract and transform
the weather data is in: weather_data.ipynb</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Load</h3>
<h3>All three CSV files were read and loaded into a Dataframe
</h3>
<ul>
<li style="margin-left:2em">Source for home_price_final.csv:
Kaggle: Indiana - Property Sales Disclosure</li>
<li style="margin-left:2em">Source for
average_indiana_weather.csv:
<a
href="https://www.usclimatedata.com/climate/indiana/united-states/3184">U.S.
Climate Data</a></li>
<li style="margin-left:2em">Source for IN_census_file.csv:
<a
href="https://github.com/Davis1776/Weather_Impact_on_Home_Sales_ETL_Project/blob/master/www.census.gov">
U.S. Census Data</a></li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>For ERD</h3>
<ul>
<li style="margin-left:2em">CSV files had some random
empty columns and gave error “Column violates ‘Not Null’ constraint” while
trying to create ERD diagram and create Tables.</li>
<li style="margin-left:2em">CSV files after filling
empty columns to replace with new address where
address was null, etc.</li>
<li style="margin-left:2em">The home_price_final.csv was
renamed to home_price_final_ERD.csv</li>
<li style="margin-left:2em">The
average_indiana_weather.csv was renamed to
average_indiana_weather_ERD.csv</li>
<li style="margin-left:2em">The IN_census_file.csv was
renamed to IN_census_file_ERD.csv</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>ERD Notes</h3>
<ul>
<li style="margin-left:2em">ERD diagram is saved as
a PNG file QuickDBD-export.png</li>
<li style="margin-left:2em">DB Table creation script
is stored in HomeSales_DB.sql</li>
</ul>
<span class="image featured_60"><img
src="images/weather/ERD_Schema.png"
alt="centered image" /></span>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>For IPYNB</h3>
<ul>
<li style="margin-left:2em">Column names were
made pretty by removing the space and
brackets by using .str.replace(' ', '_') ('(', '') (')',
'') function</li>
<li style="margin-left:2em">Columns with NaN
were replaced with key value pair</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Loading Data Into the Database</h3>
<ul>
<li style="margin-left:2em">Created
connection to PostgreSQL</li>
<li style="margin-left:2em">Final DB named
HomeSales_DB is loaded to PostgreSQL
</li>
<li style="margin-left:2em">All three
DFs were converted to SQL</li>
<li style="margin-left:2em">Finally Queried
to check if the DB had all the tables
</li>
<li style="margin-left:2em">Final Extract
Transform Load work shown in:
Final_ETL.ipynb</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Potential Future Questions to Research:
</h3>
<ol>
<li style="margin-left:2em">Does Indiana
monthly temperature impact monthly growth?</li>
<li style="margin-left:2em">Where in the
Unites States are most people moving to Indiana from?</li>
<li style="margin-left:2em">What counties in Indiana experience
the most growth per year?</li>
<li style="margin-left:2em">Which season of the year provides
a decline in yearly population growth?</li>
</ol>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3>Tools Used</h3>
<ul>
<li style="margin-left:2em">pgAdmin4</li>
<li style="margin-left:2em">PostgreSQL</li>
<li style="margin-left:2em">Python</li>
<li style="margin-left:2em">Beautiful Soup</li>
<li style="margin-left:2em">ChromeDriver</li>
<li style="margin-left:2em">Multiple APIs</li>
<li style="margin-left:2em">Plot.ly</li>
<li style="margin-left:2em">ERD Diagram</li>
<li style="margin-left:2em">Jupyter Notebook/Lab</li>
<li style="margin-left:2em">JSON</li>
<li style="margin-left:2em">CSV</li>
<li style="margin-left:2em">GitHub & GitHub Pages</li>
</ul>
<div class="row-6 row-12-mobilep">
<hr class="divider">
<h3> </h3>
<p>
<p>
</div>
</section>
<!-- Footer -->
<footer id="footer">
<ul class="icons">
<li><a href="https://github.com/Davis1776" class="icon brands fa-github" rel="noopener noreferrer"
target="_blank"><span class="label">Github</span></a></li>
<li><a href="https://www.linkedin.com/in/mrdavis/" class="icon brands fa-linkedin"
rel="noopener noreferrer" target="_blank"><span class="label">LinkedIn</span></a></li>
<!-- <li><a href="#" class="icon brands fa-dribbble"><span class="label">Dribbble</span></a></li> -->
<li><a href="mailto:MDavis@GaugeResearch.com" class="icon solid fa-envelope"><span
class="label">Email</span></a></li>
</ul>
<ul class="copyright">
<li>© 2023 Michael R. Davis.</li>
<li>All rights reserved</a></li>
</ul>
</footer>
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.dropotron.min.js"></script>
<script src="assets/js/jquery.scrollex.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>