Skip to content

Latest commit

 

History

History
111 lines (62 loc) · 2.89 KB

README.md

File metadata and controls

111 lines (62 loc) · 2.89 KB

Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.

Test in Collab

image image image

Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.

Run:

git clone https://github.com/ayanatherate/dfcorrs.git
cd dfcorrs 
pip install -r requirements.txt

If using ipynb notebooks:

!git clone https://github.com/ayanatherate/dfcorrs.git

Open any Python Notebook/IDE:

Cramer's v correlation for Categorical features

from dfcorrs.cramersvcorr import Cramers
import pandas as pd

cramers=Cramers()
data=pd.read_csv(r'../adatasetwithlotsofcategoricalandcontinuousfeatures.csv')


cramers.corr(data)

"""
 cramer's v corr comparison between all categorical features
 returns a Pandas datframe similar to .corr()
"""


cramers.corr(data, plot_htmp=True)

"""
plots correlaton heatmap using plotly
"""

cramers.corr(data)[#feature_name]

"""
single out a categorical feature and observe correlations, returns Pandas Series
"""

At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :


For custom adding categorical columns for cramers corr comparison use:

cramers.corr(data, add_cols=['feature_name'])

"""
 added column should be present in the dataset provided
 kindly use .astype('str') to force-convert falsely identified continuous columns (if any) before using.
"""

For custom removing categorical(or redundant) columns for cramers corr comparison, use:

cramers.corr(data, rem_cols=['feature_name'])

If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:

"""
single-shot operation, does not remap
after applying operatio on the entire dataframe
"""
cramers.cramers_v(data['feature_name1'], data['feature_name2'])

cramers.cramers_v([i for i in some classes1], [i for i in some classes2]) #say, we have two python arrays/lists instead