Skip to content

Commit

Permalink
BUG: Fixed TypeError for Series.isin() when large series and values c…
Browse files Browse the repository at this point in the history
…ontains NA (#60678) (#60736)

* BUG: Fixed TypeError for Series.isin() when large series and values contains NA (#60678)

* Add entry to whatsnew/v3.0.0.rst for bug fixing

* Replaced np.vectorize() with any() for minor performance improvement and add new test cases

* Fixed failed pre-commit.ci hooks : Formatting errors in algorithms.py, inconsistent-namespace-usage in test_isin.py, sorted whatsnew entry

* Combined redundant if-statements to improve readability and performance
  • Loading branch information
akj2018 authored Jan 22, 2025
1 parent fef01c5 commit 1d33e4c
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -804,6 +804,7 @@ Other
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
- Bug in :meth:`Series.rank` that doesn't preserve missing values for nullable integers when ``na_option='keep'``. (:issue:`56976`)
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently replacing matching instances when ``regex=True`` and missing values are present. (:issue:`56599`)
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` throwing ``ValueError`` when ``regex=True`` and all NA values. (:issue:`60688`)
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
iNaT,
lib,
)
from pandas._libs.missing import NA
from pandas._typing import (
AnyArrayLike,
ArrayLike,
Expand Down Expand Up @@ -544,10 +545,15 @@ def isin(comps: ListLike, values: ListLike) -> npt.NDArray[np.bool_]:
# Ensure np.isin doesn't get object types or it *may* throw an exception
# Albeit hashmap has O(1) look-up (vs. O(logn) in sorted array),
# isin is faster for small sizes

# GH60678
# Ensure values don't contain <NA>, otherwise it throws exception with np.in1d

if (
len(comps_array) > _MINIMUM_COMP_ARR_LEN
and len(values) <= 26
and comps_array.dtype != object
and not any(v is NA for v in values)
):
# If the values include nan we need to check for nan explicitly
# since np.nan it not equal to np.nan
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/series/methods/test_isin.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,30 @@ def test_isin_large_series_mixed_dtypes_and_nan(monkeypatch):
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"dtype, data, values, expected",
[
("boolean", [pd.NA, False, True], [False, pd.NA], [True, True, False]),
("Int64", [pd.NA, 2, 1], [1, pd.NA], [True, False, True]),
("boolean", [pd.NA, False, True], [pd.NA, True, "a", 20], [True, False, True]),
("boolean", [pd.NA, False, True], [], [False, False, False]),
("Float64", [20.0, 30.0, pd.NA], [pd.NA], [False, False, True]),
],
)
def test_isin_large_series_and_pdNA(dtype, data, values, expected, monkeypatch):
# https://github.com/pandas-dev/pandas/issues/60678
# combination of large series (> _MINIMUM_COMP_ARR_LEN elements) and
# values contains pdNA
min_isin_comp = 2
ser = Series(data, dtype=dtype)
expected = Series(expected, dtype="boolean")

with monkeypatch.context() as m:
m.setattr(algorithms, "_MINIMUM_COMP_ARR_LEN", min_isin_comp)
result = ser.isin(values)
tm.assert_series_equal(result, expected)


def test_isin_complex_numbers():
# GH 17927
array = [0, 1j, 1j, 1, 1 + 1j, 1 + 2j, 1 + 1j]
Expand Down

0 comments on commit 1d33e4c

Please sign in to comment.