Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/pandas-dev/pandas into bug#…
Browse files Browse the repository at this point in the history
…60695
  • Loading branch information
Anurag-Varma committed Feb 25, 2025
2 parents 1cf61a6 + 10762c6 commit 2a1f226
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 2 deletions.
2 changes: 1 addition & 1 deletion doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
obj = cast(str, obj) # Mypy complains without this!
return obj.upper()
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_). While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable

.. code-block:: python
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -790,6 +790,7 @@ ExtensionArray
^^^^^^^^^^^^^^
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -1208,7 +1208,12 @@ def factorize(
data = data.cast(pa.int64())

if pa.types.is_dictionary(data.type):
encoded = data
if null_encoding == "encode":
# dictionary encode does nothing if an already encoded array is given
data = data.cast(data.type.value_type)
encoded = data.dictionary_encode(null_encoding=null_encoding)
else:
encoded = data
else:
encoded = data.dictionary_encode(null_encoding=null_encoding)
if encoded.length() == 0:
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6267,6 +6267,11 @@ def astype(
"""
Cast a pandas object to a specified dtype ``dtype``.
This method allows the conversion of the data types of pandas objects,
including DataFrames and Series, to the specified dtype. It supports casting
entire objects to a single data type or applying different data types to
individual columns using a mapping.
Parameters
----------
dtype : str, data type, Series or Mapping of column name -> data type
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -3329,6 +3329,18 @@ def test_factorize_chunked_dictionary():
tm.assert_index_equal(res_uniques, exp_uniques)


def test_factorize_dictionary_with_na():
# GH#60567
arr = pd.array(
["a1", pd.NA], dtype=ArrowDtype(pa.dictionary(pa.int32(), pa.utf8()))
)
indices, uniques = arr.factorize(use_na_sentinel=False)
expected_indices = np.array([0, 1], dtype=np.intp)
expected_uniques = pd.array(["a1", None], dtype=ArrowDtype(pa.string()))
tm.assert_numpy_array_equal(indices, expected_indices)
tm.assert_extension_array_equal(uniques, expected_uniques)


def test_dictionary_astype_categorical():
# GH#56672
arrs = [
Expand Down

0 comments on commit 2a1f226

Please sign in to comment.