Skip to content

Fix bad column names postgres

Thibault Bétrémieux edited this page May 31, 2022 · 10 revisions

pangres.fix_psycopg2_bad_cols(df: pandas.core.frame.DataFrame, replacements: dict = {'%': '', '(': '', ')': ''}) -> pandas.core.frame.DataFrame

Replaces '%', '(' and ')' (characters that won't play nicely or even at all with psycopg2) in column and index names in a deep copy of df. This is a workaround for the unresolved issue described here: https://github.com/psycopg/psycopg2/issues/167

IMPORTANT: You will need to apply the same changes in the database as well if the SQL table already exists for a given DataFrame. Otherwise you will for instance end up with a column "total_%" and "total_" in your SQL table.

Parameters

  • df : pd.DataFrame

  • replacements : dict {'%':str, '(':str, ')':str}, default {'%':'', '(':'', ')':''}

    The keys '%', '(' and ')' are mandatory. There cannot be any extra keys.

Returns

  • new_df : pd.DataFrame

Raises

  • pangres.exceptions.UnnamedIndexLevelsException

    When you pass a df where not all index levels are named

  • pangres.exceptions.DuplicateLabelsException

    When you pass a df with duplicated labels accross index/columns or when after cleaning we end up with duplicated labels e.g. "test(" and "test)" would by default both be renamed to "test"

  • TypeError

    When replacements is not of the expected type or has wrong keys or has non string values

Examples

  • fix bad col/index names with default replacements (empty string for '(', ')' and '%')
from pangres import fix_psycopg2_bad_cols
import pandas as pd
df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')
print(df.to_markdown())
test() foo()%
0 0
df_fixed = fix_psycopg2_bad_cols(df)
print(df_fixed.to_markdown())
test foo
0 0
  • fix bad col/index names with custom replacements - you MUST provide replacements for '(', ')' and '%'!
import pandas as pd
df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')
print(df.to_markdown())
test() foo()%
0 0
df_fixed = fix_psycopg2_bad_cols(df, replacements={'%':'percent', '(':'', ')':''})
print(df_fixed.to_markdown())
test foopercent
0 0