Skip to content

Commit

Permalink
Merge branch 'main' into feat/add-back-jupyterlite-repl
Browse files Browse the repository at this point in the history
  • Loading branch information
agriyakhetarpal committed Feb 20, 2025
2 parents 9470bf4 + fe494c9 commit 75aeb9e
Show file tree
Hide file tree
Showing 61 changed files with 521 additions and 247 deletions.
2 changes: 1 addition & 1 deletion doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ be located.
- tests.scalar
- tests.tseries.offsets

2. Does your test depend only on code in pd._libs?
2. Does your test depend only on code in ``pd._libs``?
This test likely belongs in one of:

- tests.libs
Expand Down
2 changes: 1 addition & 1 deletion doc/source/development/contributing_gitpod.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ development experience:

* `VSCode rst extension <https://marketplace.visualstudio.com/items?itemName=lextudio.restructuredtext>`_
* `Markdown All in One <https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one>`_
* `VSCode Gitlens extension <https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens>`_
* `VSCode GitLens extension <https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens>`_
* `VSCode Git Graph extension <https://marketplace.visualstudio.com/items?itemName=mhutchie.git-graph>`_

Development workflow with Gitpod
Expand Down
2 changes: 1 addition & 1 deletion doc/source/development/developer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Column metadata
* Boolean: ``'bool'``
* Integers: ``'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64'``
* Floats: ``'float16', 'float32', 'float64'``
* Date and Time Types: ``'datetime', 'datetimetz'``, ``'timedelta'``
* Date and Time Types: ``'datetime', 'datetimetz', 'timedelta'``
* String: ``'unicode', 'bytes'``
* Categorical: ``'categorical'``
* Other Python objects: ``'object'``
Expand Down
1 change: 1 addition & 0 deletions doc/source/reference/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Attributes
Series.array
Series.values
Series.dtype
Series.info
Series.shape
Series.nbytes
Series.ndim
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -906,7 +906,7 @@ resetting indexes.
Joining multiple :class:`DataFrame`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A list or tuple of ``:class:`DataFrame``` can also be passed to :meth:`~DataFrame.join`
A list or tuple of :class:`DataFrame` can also be passed to :meth:`~DataFrame.join`
to join them together on their indexes.

.. ipython:: python
Expand Down
13 changes: 12 additions & 1 deletion doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ Other enhancements
updated to work correctly with NumPy >= 2 (:issue:`57739`)
- :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`)
- :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`)
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns when backed by PyArrow (:issue:`60633`)
- The :meth:`Series.str.decode` has gained the argument ``dtype`` to control the dtype of the result (:issue:`60940`)
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns (:issue:`60633`)
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)

.. ---------------------------------------------------------------------------
Expand All @@ -53,6 +54,16 @@ These are bug fixes that might have notable behavior changes.
notable_bug_fix1
^^^^^^^^^^^^^^^^

.. _whatsnew_230.api_changes:

API changes
~~~~~~~~~~~

- When enabling the ``future.infer_string`` option: Index set operations (like
union or intersection) will now ignore the dtype of an empty ``RangeIndex`` or
empty ``Index`` with object dtype when determining the dtype of the resulting
Index (:issue:`60797`)

.. ---------------------------------------------------------------------------
.. _whatsnew_230.deprecations:

Expand Down
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Other enhancements
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
- :meth:`pandas.api.interchange.from_dataframe` now uses the `PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_ if available, only falling back to the Dataframe Interchange Protocol if that fails (:issue:`60739`)
- Added :meth:`.Styler.to_typst` to write Styler objects to file, buffer or string in Typst format (:issue:`57617`)
- Added missing :meth:`pandas.Series.info` to API reference (:issue:`60926`)
- :class:`pandas.api.typing.NoDefault` is available for typing ``no_default``
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
- :func:`pandas.merge` now validates the ``how`` parameter input (merge type) (:issue:`59435`)
Expand Down Expand Up @@ -70,6 +71,7 @@ Other enhancements
- :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
Expand Down Expand Up @@ -360,6 +362,9 @@ Other API changes
- pickle and HDF (``.h5``) files created with Python 2 are no longer explicitly supported (:issue:`57387`)
- pickled objects from pandas version less than ``1.0.0`` are no longer supported (:issue:`57155`)
- when comparing the indexes in :func:`testing.assert_series_equal`, check_exact defaults to True if an :class:`Index` is of integer dtypes. (:issue:`57386`)
- Index set operations (like union or intersection) will now ignore the dtype of
an empty ``RangeIndex`` or empty ``Index`` with object dtype when determining
the dtype of the resulting Index (:issue:`60797`)

.. ---------------------------------------------------------------------------
.. _whatsnew_300.deprecations:
Expand Down Expand Up @@ -666,6 +671,7 @@ Conversion
- Bug in :meth:`DataFrame.astype` not casting ``values`` for Arrow-based dictionary dtype correctly (:issue:`58479`)
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` removing timezone information for objects with :class:`ArrowDtype` (:issue:`60237`)
- Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)

Strings
Expand Down Expand Up @@ -781,6 +787,7 @@ Sparse

ExtensionArray
^^^^^^^^^^^^^^
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
Expand Down
28 changes: 1 addition & 27 deletions pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -818,33 +818,7 @@ def is_monotonic(const numeric_object_t[:] arr, bint timelike):
if timelike and <int64_t>arr[0] == NPY_NAT:
return False, False, False

if numeric_object_t is not object:
with nogil:
prev = arr[0]
for i in range(1, n):
cur = arr[i]
if timelike and <int64_t>cur == NPY_NAT:
is_monotonic_inc = 0
is_monotonic_dec = 0
break
if cur < prev:
is_monotonic_inc = 0
elif cur > prev:
is_monotonic_dec = 0
elif cur == prev:
is_unique = 0
else:
# cur or prev is NaN
is_monotonic_inc = 0
is_monotonic_dec = 0
break
if not is_monotonic_inc and not is_monotonic_dec:
is_monotonic_inc = 0
is_monotonic_dec = 0
break
prev = cur
else:
# object-dtype, identical to above except we cannot use `with nogil`
with nogil(numeric_object_t is not object):
prev = arr[0]
for i in range(1, n):
cur = arr[i]
Expand Down
15 changes: 1 addition & 14 deletions pandas/_libs/hashtable_func_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -415,20 +415,7 @@ def mode(ndarray[htfunc_t] values, bint dropna, const uint8_t[:] mask=None):

modes = np.empty(nkeys, dtype=values.dtype)

if htfunc_t is not object:
with nogil:
for k in range(nkeys):
count = counts[k]
if count == max_count:
j += 1
elif count > max_count:
max_count = count
j = 0
else:
continue

modes[j] = keys[k]
else:
with nogil(htfunc_t is not object):
for k in range(nkeys):
count = counts[k]
if count == max_count:
Expand Down
6 changes: 3 additions & 3 deletions pandas/_libs/internals.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ def get_concat_blkno_indexers(list blknos_list not None):
@cython.boundscheck(False)
@cython.wraparound(False)
def get_blkno_indexers(
int64_t[:] blknos, bint group=True
const int64_t[:] blknos, bint group=True
) -> list[tuple[int, slice | np.ndarray]]:
"""
Enumerate contiguous runs of integers in ndarray.
Expand Down Expand Up @@ -596,8 +596,8 @@ def get_blkno_placements(blknos, group: bool = True):
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef update_blklocs_and_blknos(
ndarray[intp_t, ndim=1] blklocs,
ndarray[intp_t, ndim=1] blknos,
const intp_t[:] blklocs,
const intp_t[:] blknos,
Py_ssize_t loc,
intp_t nblocks,
):
Expand Down
17 changes: 10 additions & 7 deletions pandas/_libs/join.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,10 @@ def full_outer_join(const intp_t[:] left, const intp_t[:] right,

@cython.wraparound(False)
@cython.boundscheck(False)
cdef void _get_result_indexer(intp_t[::1] sorter, intp_t[::1] indexer) noexcept nogil:
cdef void _get_result_indexer(
const intp_t[::1] sorter,
intp_t[::1] indexer,
) noexcept nogil:
"""NOTE: overwrites indexer with the result to avoid allocating another array"""
cdef:
Py_ssize_t i, n, idx
Expand Down Expand Up @@ -681,8 +684,8 @@ def outer_join_indexer(ndarray[numeric_object_t] left, ndarray[numeric_object_t]
from pandas._libs.hashtable cimport Int64HashTable


def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
ndarray[numeric_t] right_values,
def asof_join_backward_on_X_by_Y(const numeric_t[:] left_values,
const numeric_t[:] right_values,
const int64_t[:] left_by_values,
const int64_t[:] right_by_values,
bint allow_exact_matches=True,
Expand Down Expand Up @@ -752,8 +755,8 @@ def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
return left_indexer, right_indexer


def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
ndarray[numeric_t] right_values,
def asof_join_forward_on_X_by_Y(const numeric_t[:] left_values,
const numeric_t[:] right_values,
const int64_t[:] left_by_values,
const int64_t[:] right_by_values,
bint allow_exact_matches=1,
Expand Down Expand Up @@ -824,8 +827,8 @@ def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
return left_indexer, right_indexer


def asof_join_nearest_on_X_by_Y(ndarray[numeric_t] left_values,
ndarray[numeric_t] right_values,
def asof_join_nearest_on_X_by_Y(const numeric_t[:] left_values,
const numeric_t[:] right_values,
const int64_t[:] left_by_values,
const int64_t[:] right_by_values,
bint allow_exact_matches=True,
Expand Down
6 changes: 2 additions & 4 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -981,16 +981,14 @@ def get_level_sorter(

@cython.boundscheck(False)
@cython.wraparound(False)
def count_level_2d(ndarray[uint8_t, ndim=2, cast=True] mask,
def count_level_2d(const uint8_t[:, :] mask,
const intp_t[:] labels,
Py_ssize_t max_bin,
):
cdef:
Py_ssize_t i, j, k, n
Py_ssize_t i, j, k = mask.shape[1], n = mask.shape[0]
ndarray[int64_t, ndim=2] counts

n, k = (<object>mask).shape

counts = np.zeros((n, max_bin), dtype="i8")
with nogil:
for i in range(n):
Expand Down
22 changes: 1 addition & 21 deletions pandas/_libs/reshape.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -40,27 +40,7 @@ def unstack(const numeric_object_t[:, :] values, const uint8_t[:] mask,
cdef:
Py_ssize_t i, j, w, nulls, s, offset

if numeric_object_t is not object:
# evaluated at compile-time
with nogil:
for i in range(stride):

nulls = 0
for j in range(length):

for w in range(width):

offset = j * width + w

if mask[offset]:
s = i * width + w
new_values[j, s] = values[offset - nulls, i]
new_mask[j, s] = 1
else:
nulls += 1

else:
# object-dtype, identical to above but we cannot use nogil
with nogil(numeric_object_t is not object):
for i in range(stride):

nulls = 0
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,12 @@ def __init__(
if isinstance(values.dtype, ArrowDtype) and issubclass(
values.dtype.type, CategoricalDtypeType
):
arr = values._pa_array.combine_chunks()
from pandas import Index

if isinstance(values, Index):
arr = values._data._pa_array.combine_chunks()
else:
arr = values._pa_array.combine_chunks()
categories = arr.dictionary.to_pandas(types_mapper=ArrowDtype)
codes = arr.indices.to_numpy()
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered)
Expand Down
83 changes: 83 additions & 0 deletions pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
)

from pandas.core import (
missing,
nanops,
ops,
)
Expand Down Expand Up @@ -870,6 +871,88 @@ def _reduce(

raise TypeError(f"Cannot perform reduction '{name}' with string dtype")

def _accumulate(self, name: str, *, skipna: bool = True, **kwargs) -> StringArray:
"""
Return an ExtensionArray performing an accumulation operation.
The underlying data type might change.
Parameters
----------
name : str
Name of the function, supported values are:
- cummin
- cummax
- cumsum
- cumprod
skipna : bool, default True
If True, skip NA values.
**kwargs
Additional keyword arguments passed to the accumulation function.
Currently, there is no supported kwarg.
Returns
-------
array
Raises
------
NotImplementedError : subclass does not define accumulations
"""
if name == "cumprod":
msg = f"operation '{name}' not supported for dtype '{self.dtype}'"
raise TypeError(msg)

# We may need to strip out trailing NA values
tail: np.ndarray | None = None
na_mask: np.ndarray | None = None
ndarray = self._ndarray
np_func = {
"cumsum": np.cumsum,
"cummin": np.minimum.accumulate,
"cummax": np.maximum.accumulate,
}[name]

if self._hasna:
na_mask = cast("npt.NDArray[np.bool_]", isna(ndarray))
if np.all(na_mask):
return type(self)(ndarray)
if skipna:
if name == "cumsum":
ndarray = np.where(na_mask, "", ndarray)
else:
# We can retain the running min/max by forward/backward filling.
ndarray = ndarray.copy()
missing.pad_or_backfill_inplace(
ndarray,
method="pad",
axis=0,
)
missing.pad_or_backfill_inplace(
ndarray,
method="backfill",
axis=0,
)
else:
# When not skipping NA values, the result should be null from
# the first NA value onward.
idx = np.argmax(na_mask)
tail = np.empty(len(ndarray) - idx, dtype="object")
tail[:] = self.dtype.na_value
ndarray = ndarray[:idx]

# mypy: Cannot call function of unknown type
np_result = np_func(ndarray) # type: ignore[operator]

if tail is not None:
np_result = np.hstack((np_result, tail))
elif na_mask is not None:
# Argument 2 to "where" has incompatible type "NAType | float"
np_result = np.where(na_mask, self.dtype.na_value, np_result) # type: ignore[arg-type]

result = type(self)(np_result)
return result

def _wrap_reduction_result(self, axis: AxisInt | None, result) -> Any:
if self.dtype.na_value is np.nan and result is libmissing.NA:
# the masked_reductions use pd.NA -> convert to np.nan
Expand Down
Loading

0 comments on commit 75aeb9e

Please sign in to comment.