GNB classifier trained to predict individual's income
Data set used in this implementation is Adult Data Set of UCI machine learning repository, visit the website for more information.
- Ignore any unknwon attributes that marked as "?" in the dataset and sum up the probability of the occurance of discrete attribute to 1
- Assume the result of log 0 is negative infinite.
The prediction error obtained from this implementation is 16.90%, which is larger than the result descripted on the website, which is 16.12%. Probably it is because I didn't use smoothing in this implementation.
- Use Numpy for scientific computing.