An automated tool for binary and multi-class classification and hyper-parameter optimization on stationary and streaming type datasets. Trains different architectures for traditional batch-type datasets (KNN, DT, Random Forests, SVM, Bagging, Boosting etc.) and streaming datasets (Hoeffding Tree classifier, SAM-KNN, Adaptive Hoeffding Trees, Adaptive Random Forests, OzaBag, OzaBoost etc.) and generates metric dumps and performance evaluation graphs (ROCs) comparing the best models. Hypothesis Testing using Friedmans Statistics and Nemenyis Post-hoc test is also supported for comparative analysis of algorithms using statistical techniques.
├───configs
│ │───model_hparams.py
│
├───data
│ |───drug_consumption.data
│
├───dataset
│ │───dataset_base.py
│ │───feature_select.py
│
├───driver
│ |───driver.py
│
├───models
│ │───models.py
│
├───output
│ ├───run_20221001-124242
│ ...
| ...
| ...
└───utils
│───plot_results.py
│───scoring.py
model_hparams.py
: Hyperparameter combinations for each model can be specified here.drug_consumption.data
: Stores the dataset (all datasets are stored under data folder.)driver.py
: Starting point for execution of the program (default).driver_online.py
: Starting point for execution of the program for online models.models.py
: Model classes and definitions.plot_results.py
: Utility to plot ROC curvesscoring.py
: Utility to compute different metrics such as GMean, F-score, AUC etc.dataset.py
: Dataset class, used for preparing train test splits and pre-processing data.feature_select.py
: Feature Selection algorithms used for feature reduction based on statistical tests.output
: Directory where run dumps are generated with evaluation of models and vizualisation of performance through ROC plots and confusion metrics.
- Batch based Models
# Navigate to the root directory
>> python ./driver/driver.py
- Online Models
# Navigate to the root directory
>> python ./driver/driver_online.py