DayF (Decision at your Fingertips) is an AutoML GPL3 opensource development framework that let developers works with Machine Learning models without any idea of AI, simply taking a .csv dataset and the objective column.
gDayF Framework make all transformations (Normalization, cleaning, etc ) and choose the best model and parametrization selection for you stroing all dataset and model execution parameters in a .json file.
Clone Git repository: https://github.com/e2its/gdayf-core.git
- python (3.7)
- activate gdayf-core
- pip install h2o==3.30.0.1
- pip install pyspark==2.4.5
- pip install pandas
- pip install hdfs
- pip install pymongo
- e2its/ubuntu-spark:2.4.5
- e2its/ububtu-h2o:3.30.0.1
- e2its/mongodb:latest
-
MongoDB: installed on 0.0.0.0:27017:
- "mongoDB": { "value": "gdayf-v1", "url": "0.0.0.0", "port": "27017", "type":"mongoDB", "hash_value": null, "hash_type":"MD5" }
-
HDFS:
- "hdfs": {"value": "/gdayf-v1/experiments" , "type":"hdfs", "url":"http://0.0.0.0:50070", "uri":"hdfs:/<<namenode_ip>>:8020", "hash_value": null, "hash_type":"MD5" }
-
LocalFS:
- "localfs": {"value": "/Data/gdayf-v1/experiments" , "type":"localfs", "hash_value": null, "hash_type":"MD5" }
-
Define primary path to be used:
- "primary_path": "localfs"
-
Establish different levels of storage based on Storage engines configured:
- "load_path": [ {"value": "models" , "type":"localfs", "hash_value": null, "hash_type":"MD5" } ]
- "log_path" : [ {"value": "log" , "type":"localfs", "hash_value": null, "hash_type":"MD5" } ]
- "json_path" : [ {"value": "json" , "type":"mongoDB", "hash_value": null, "hash_type":"MD5" } ]
- "prediction_path" : [ {"value": "prediction" , "type":"mongoDB", "hash_value": null, "hash_type":"MD5" } ]
A doxygen graphviz technical documentation can be located on doc folder in the project
Test.py
scripts can be found on test/src folder in the project
- H2o.ai - a Machine Learning engine working on Hadoop/Yarn, Spark, or your laptop.
- Apache Spark MLlib - is a fast and general-purpose cluster computing for machine learning.
- mongoDB - NoSQL, Json based database.
- Apache HDFS - is a distributed file system designed to run on commodity hardware.
- Pandas - is an open source Python Data Analysis Library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
- Jose L. Sanchez del Coso - e2its - Linkedin
This project is licensed under the GPL3 License - see the LICENSE.md for details