Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.
Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.
Inspiration: Wikipedia page Cramer's v
git clone https://github.com/ayanatherate/dfcorrs.git
cd dfcorrs
pip install -r requirements.txt
!git clone https://github.com/ayanatherate/dfcorrs.git
from dfcorrs.cramersvcorr import Cramers
import pandas as pd
cramers=Cramers()
data=pd.read_csv(r'../adatasetwithlotsofcategoricalandcontinuousfeatures.csv')
cramers.corr(data)
"""
cramer's v corr comparison between all categorical features
returns a Pandas datframe similar to .corr()
"""
cramers.corr(data, plot_htmp=True)
"""
plots correlaton heatmap using plotly
"""
cramers.corr(data)[#feature_name]
"""
single out a categorical feature and observe correlations, returns Pandas Series
"""
At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :
cramers.corr(data, add_cols=['feature_name'])
"""
added column should be present in the dataset provided
kindly use .astype('str') to force-convert falsely identified continuous columns (if any) before using.
"""
cramers.corr(data, rem_cols=['feature_name'])
If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:
"""
single-shot operation, does not remap
after applying operatio on the entire dataframe
"""
cramers.cramers_v(data['feature_name1'], data['feature_name2'])
cramers.cramers_v([i for i in some classes1], [i for i in some classes2]) #say, we have two python arrays/lists instead