Skip to content

L-Thirat/Journal-ade

Repository files navigation

Autonomous Data science Engine

Abstract

We develop the autonomous data science tool to allow artificial intelligence to solve data science problems. Data exploration and feature extraction are the most time-consuming steps in a data science process. We develop an event-based feature synthesis algorithm, which can automatically recognize relationships between different entities and events presented in the data, extract important features using statistical and mathematical functions, and filter out only features of high importance. Our algorithm can generate features for data science problems with single and multiple data tables and use them to fit random forest classifier. To test the robustness of our autonomous data science engine (ADE) framework against well-established Deep Feature Synthesis (DFS) framework, we put our data science bot to test in public data science challenges and assess the usefulness of our feature sets. ADE can achieve high accuracy scores in several competitions, for example, it can predict targets at the accuracy as high as 89.5%, beating 74% of human participants in Employee Access Challenge (Kaggle, 2013). In MOOC dropout prediction (KDD 2015), features from ADE can augment features from DFS framework and improve accuracy from 85.3% to 86.3%.

Journal

L. Thirat, C. Warasinee, Event-based Feature Synthesis: Autonomous Data Science Engine, Journal of Computers Vol. 30 No. 2, 2019, pp. 55-67
http://www.csroc.org.tw/journal/E_Published%20Vol_30_No_2.htm

Foundation

Asahi Glass Scholarship Foundation(AGSF)

About

Autonomous Data science Engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages