Constructed a model that leverages supervised learning techniques. Developed a model that will predict the likelihood that a given employed citizens of CA as a potential donor of a fictitious charity organization, Charity ML, located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning.
After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.
We first investigate the factors that affect the possibility of charity donations being made. Next, we use a training and predicting pipeline to evaluate the accuracy and efficiency of three supervised machine learning algorithms (SVM, Random Forests, Gaussian Naive Base). Then, we tuned the parameters of the algorithm which provides us the maximum donation rate and mnimum cost. Finally, we also surveyed the effect of features extraction in data.
This project requires Python 3.x and the following Python libraries installed:
You will also need to have software installed to run and execute an iPython Notebook
We recommend students install Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
Template code is provided in the finding_donors.ipynb
notebook file. You will also be required to use the included visuals.py
Python file and the census.csv
dataset file to complete your work. While some code has already been implemented to get you started, you will need to implement additional functionality when requested to successfully complete the project. Note that the code included in visuals.py
is meant to be used out-of-the-box and not intended for students to manipulate. If you are interested in how the visualizations are created in the notebook, please feel free to explore this Python file.