https://github.com/onionpork/coronavirus_challenge
For this challenge, datasets in Kaggle 'Novel Corona Virus 2019 Dataset' were used. Only data produced in USA has been utilitzed to complete this challenge, COVID-19 observations in other regions were not included in this challenge.
Add: Find accessible datasets in Johns Hopkins github Repo link:- https://github.com/CSSEGISandData/COVID-19
- numpy
- pandas
- searborn
- matplotlib.pylot
- sklearn
- plotly
- dash
- scripy
- Two spreadsheet have been used in this small project, which are stored in 'data' folder
- Output figures are saved in folder 'Figure'
- Three complete jupypter notebooks
- DRAFT__draft_coronavirus_challenge.ipynb
- covid19_draft_data_visual.ipynb
- covid19_draft_data_SIR.ipynb
- One incomplete jupypter notebook
- covid19_draft_model.ipynb
The process starts with business understanding.
- Posing Question - Details please see in my medium post
- Preparing data - from Kaggle coronavirus challenge, in which there are five infos.
- date
- states(location)
- confirmed case
- death case
- population size in each location
- Data processing - data wrangling, and putting into good shape.
- Analysing Modelling and Visualizing
- Evaluation- TBD. Since the COVID-19 crsis is still happening, I would update it monthly.
- It's not saft to conclude that we are off-peak.
- Nearby states should stay together since the strong correlation between near states.
- Population size does not influences the spread.
- Our ability to have enough kits to test coronavirus is under doubt.
- The simulated recovery rate is arounf 0 based on SIR model, which may indicate we are still in the spreading phase