Skip to content

This repository has been made to showcase how to obtain correlation between feature pairs of iris dataset

Notifications You must be signed in to change notification settings

RoyAmitabh/Machine_Learning_Correlation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Machine_Learning_Correlation

This repository has been made to showcase how to obtain correlation between feature pairs of iris dataset This is the url from where i have obtained the iris dataset - https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

POC steps - I have uploaded the dataset on ADLS(Azure Datalake Storage). I have used notebook in databricks for all the python codes. I have used sparkML using pyspark interface. This is the correlation on whole dataset. refer code in file corr_distro. image

Many times, it can happen that the dataset is very large. In such situation bringing whole data on master node for calculating correlation is not a good idea. So, bootstrap sampling can be used. Refer code in file - corr_with_bootstrap_sample

This is the correlation obtained on bootstrap sample. The result using sample is very near to the results using whole dataset.

image

About

This repository has been made to showcase how to obtain correlation between feature pairs of iris dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published