-
Notifications
You must be signed in to change notification settings - Fork 15
Fix bad column names postgres
pangres.fix_psycopg2_bad_cols(df: pandas.core.frame.DataFrame, replacements: dict = {'%': '', '(': '', ')': ''}) -> pandas.core.frame.DataFrame
Replaces '%', '(' and ')' (characters that won't play nicely or even at all with psycopg2) in column and index names in a deep copy of df. This is a workaround for the unresolved issue described here: https://github.com/psycopg/psycopg2/issues/167
IMPORTANT: You will need to apply the same changes in the database as well if the SQL table already exists for a given DataFrame. Otherwise you will for instance end up with a column "total_%" and "total_" in your SQL table.
-
df : pd.DataFrame
-
replacements : dict {'%':str, '(':str, ')':str}, default {'%':'', '(':'', ')':''}
The keys '%', '(' and ')' are mandatory. There cannot be any extra keys.
- new_df : pd.DataFrame
-
pangres.exceptions.UnnamedIndexLevelsException
When you pass a df where not all index levels are named
-
pangres.exceptions.DuplicateLabelsException
When you pass a df with duplicated labels accross index/columns or when after cleaning we end up with duplicated labels e.g. "test(" and "test)" would by default both be renamed to "test"
-
TypeError
When
replacements
is not of the expected type or has wrong keys or has non string values
- fix bad col/index names with default replacements (empty string for '(', ')' and '%')
from pangres import fix_psycopg2_bad_cols
import pandas as pd
df = pd.DataFrame({'test()':[0],
'foo()%':[0]}).set_index('test()')
print(df.to_markdown())
test() | foo()% |
---|---|
0 | 0 |
df_fixed = fix_psycopg2_bad_cols(df)
print(df_fixed.to_markdown())
test | foo |
---|---|
0 | 0 |
- fix bad col/index names with custom replacements - you MUST provide replacements for '(', ')' and '%'!
import pandas as pd
df = pd.DataFrame({'test()':[0],
'foo()%':[0]}).set_index('test()')
print(df.to_markdown())
test() | foo()% |
---|---|
0 | 0 |
df_fixed = fix_psycopg2_bad_cols(df, replacements={'%':'percent', '(':'', ')':''})
print(df_fixed.to_markdown())
test | foopercent |
---|---|
0 | 0 |