GitHub - ayanatherate/dfcorrs: A Python utility for Cramer's V Correlation Analysis for Categorical Features in Pandas Dataframes.

Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.

Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.

Inspiration: Wikipedia page Cramer's v

Run:

git clone https://github.com/ayanatherate/dfcorrs.git
cd dfcorrs 
pip install -r requirements.txt

If using ipynb notebooks:

!git clone https://github.com/ayanatherate/dfcorrs.git

Open any Python Notebook/IDE:

Cramer's v correlation for Categorical features

from dfcorrs.cramersvcorr import Cramers
import pandas as pd

cramers=Cramers()
data=pd.read_csv(r'../adatasetwithlotsofcategoricalandcontinuousfeatures.csv')


cramers.corr(data)

"""
 cramer's v corr comparison between all categorical features
 returns a Pandas datframe similar to .corr()
"""


cramers.corr(data, plot_htmp=True)

"""
plots correlaton heatmap using plotly
"""

cramers.corr(data)[#feature_name]

"""
single out a categorical feature and observe correlations, returns Pandas Series
"""

At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :

For custom adding categorical columns for cramers corr comparison use:

cramers.corr(data, add_cols=['feature_name'])

"""
 added column should be present in the dataset provided
 kindly use .astype('str') to force-convert falsely identified continuous columns (if any) before using.
"""

For custom removing categorical(or redundant) columns for cramers corr comparison, use:

cramers.corr(data, rem_cols=['feature_name'])

If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:

"""
single-shot operation, does not remap
after applying operatio on the entire dataframe
"""
cramers.cramers_v(data['feature_name1'], data['feature_name2'])

cramers.cramers_v([i for i in some classes1], [i for i in some classes2]) #say, we have two python arrays/lists instead

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cramersvcorr.py		cramersvcorr.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.

Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.

Inspiration: Wikipedia page Cramer's v

Run:

If using ipynb notebooks:

Open any Python Notebook/IDE:

Cramer's v correlation for Categorical features

At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :

For custom adding categorical columns for cramers corr comparison use:

For custom removing categorical(or redundant) columns for cramers corr comparison, use:

If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:

About

Releases 2

Packages

Languages

License

ayanatherate/dfcorrs

Folders and files

Latest commit

History

Repository files navigation

Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.

Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.

Inspiration: Wikipedia page Cramer's v

Run:

If using ipynb notebooks:

Open any Python Notebook/IDE:

Cramer's v correlation for Categorical features

At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :

For custom adding categorical columns for cramers corr comparison use:

For custom removing categorical(or redundant) columns for cramers corr comparison, use:

If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages