Skip to content

Commit

Permalink
Fix error value_counts result with pyarrow categorical columns (#60949
Browse files Browse the repository at this point in the history
)

* BUG: Fix PyArrow array access in Categorical constructor for Index objects (#60563)

* TST: Add test for value_counts with Arrow dictionary dtype (#60563)

* DOC: Add changelog entry for PyArrow array access fix in Categorical (#60563)

* Update doc/source/whatsnew/v3.0.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
  • Loading branch information
chilin0525 and mroeschke authored Feb 19, 2025
1 parent 4c3b573 commit e2e3791
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,7 @@ Sparse

ExtensionArray
^^^^^^^^^^^^^^
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,12 @@ def __init__(
if isinstance(values.dtype, ArrowDtype) and issubclass(
values.dtype.type, CategoricalDtypeType
):
arr = values._pa_array.combine_chunks()
from pandas import Index

if isinstance(values, Index):
arr = values._data._pa_array.combine_chunks()
else:
arr = values._pa_array.combine_chunks()
categories = arr.dictionary.to_pandas(types_mapper=ArrowDtype)
codes = arr.indices.to_numpy()
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered)
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -3511,3 +3511,20 @@ def test_map_numeric_na_action():
result = ser.map(lambda x: 42, na_action="ignore")
expected = pd.Series([42.0, 42.0, np.nan], dtype="float64")
tm.assert_series_equal(result, expected)


def test_categorical_from_arrow_dictionary():
# GH 60563
df = pd.DataFrame(
{"A": ["a1", "a2"]}, dtype=ArrowDtype(pa.dictionary(pa.int32(), pa.utf8()))
)
result = df.value_counts(dropna=False)
expected = pd.Series(
[1, 1],
index=pd.MultiIndex.from_arrays(
[pd.Index(["a1", "a2"], dtype=ArrowDtype(pa.string()), name="A")]
),
name="count",
dtype="int64",
)
tm.assert_series_equal(result, expected)

0 comments on commit e2e3791

Please sign in to comment.