Anshu Project Repository https://github.com/Anshuboom/Anshuboom_SAFP_repo This Readme file has the following main objective: ###################################################################################
if you want to run the model how do you load the assets and which notebooks you run ###################################################################################
In the myCode.py file there are 2 places where you will need to insert a googeAPI key since this project uses geolocation seach for: "googlemaps.Client(key=" to enter your personal key (I got a warning from Google when made the repository public)
-
Data folder contains all the necessary, latest (at commit time) datafiles
- Go through the myCode.py or myCode.ipynb and find the load section at the top and at the very bottom of the file.
Based on the make file I suggest: py files in /src, notebook files in /Notebooks, data files in /Data, model files in /Models
NOTE: as it stands, the easiest way to run the notebooks is to copy the notebook and datafiles and all .pkl in the same folder because thats the way the paths are currently defined. However if you need it to be organised and will edit the paths, use the convention above.
- change the directory path according to where you download and save the data assets (of the Data folder) on your local pc
-
The main results notbooks are:
- ChartCompilation.ipynb found in notebooks
- mapchecker.ipynb found in notebooks
As long as the datafile paths have been corrected in myCode.ipynb, each of the above loads the data for you.
-
Make sure you run each cell of the ChartCompilation and mapChecker (especially %run myCode.ipynb)
####################################################################################
ChartCompilation --The point of this file is to acquaint the user to the data as it describes the Shark Data found in dfAttacks.csv
It describes the sharks and their individual implications in the attacks
It describes the countries and their individual implications in the attacks
It also shows the risk of each shark and country as well as the oceans where attacks have occurred
The last integer argument in each of the bar charts is the number to subset the x axis to make the chart more presentable and readable since all charts are pre sorted going from highest to lowest y values;
EXAMPLE:
topProbabilityofFatalities(dfCountry, 'Country', 'FatalityProbability', 40)
to chart the top 4, you only need to set that last integer to 4
#####################################################################################
mapChecker is the prediction notebook
- It loads the assets
- It presents a form that requires the input of "Gender", "Location", "TimeSlot" and "Activity" (which are used to compose a valid X record)
- Location is geolocated using Googlemaps Geolocator
- Activity is NLP'd so you can type full sentences "Snorkeling near th Barrier Reef"
- When you click submit, it checks the inputs, creats an X row and feeds it into the trained model
- It receives the predictions and probabilities and displays a blurb with the results
- The next cell uses the geolocation, finds the k nearest attacks around the location and plots them, each neigbor has a proximity/range circle with radius 20KM
- The idea is to show you your location in proximity to known ATTACKS (not fatalities) in the area
- The next cell then determines the possibility of the appropriate sharks that would be culprit if a FATAL attack were to occur based on their stats, habitat and history according to the ISAF
- What you get is either the KNOWN sharks in the area identified by the records t have attacked/killed OR if the attacks were unidentified, the main sharks based on temperature conditions that could be resposible in that area as well as their fatality probability. It charts them on a bar chart.
######################################################################################
My Sources: Much of the data enrichment came from reading up about the sharks, their habitat preferences, etc from wikipaedia example:https://en.wikipedia.org/wiki/Wobbegong#:~:text=They%20are%20found%20in%20shallow,as%20far%20north%20as%20Japan.
The actual DataSource downloaded using the API provided by: https://public.opendatasoft.com/explore/dataset/global-shark-attack/api/?disjunctive.country&disjunctive.area&disjunctive.activity This data eventually, after the operations of wrangler.ipynb then wrangler2.ipynb was converted to dfAttacksX.csv which was used in the modeling I did not do ANY scraping in this project and I am QUITE happy about that
It is important to note that wrangler.ipynb, and wrangler2.ipynb are no longer in use since the have already created the cleaned input file: dfAttacks.csv, which myCode.ipynb will load for all the relevant scripts. They have been left in the repo for future use if needed.
I tend to binge watch all shark attack videos on Youtube and started watching "Sharks Happen" which got me interesed and inspired to addopt this project topic http://sharkshappen.com : what got me going was the "Sharks Happen Stats" xlsx file which the author generously provides. I looked at it and got the idea. The file itself was too messy to clean up and only consists 474 records but individual records were used to enrich the dfAttacks.csv file above based on victim names, I am very grateful to the auhor for this because I appreciate his interest, motivation and efforts
In order to lookup water temperatures of the seas around given locations in order to compile my sharks dictionary, I also used https://www.seatemperature.org/.