diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index bacca20780191..39f52bb3edd8e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -4,9 +4,6 @@ # ci ci/ @mroeschke -# web -web/ @datapythonista - # docs doc/cheatsheet @Dr-Irv doc/source/development @noatamir diff --git a/doc/source/development/contributing_gitpod.rst b/doc/source/development/contributing_gitpod.rst index 2ba43a44b87d3..b70981b4d307d 100644 --- a/doc/source/development/contributing_gitpod.rst +++ b/doc/source/development/contributing_gitpod.rst @@ -109,7 +109,7 @@ development experience: * `VSCode rst extension `_ * `Markdown All in One `_ -* `VSCode Gitlens extension `_ +* `VSCode GitLens extension `_ * `VSCode Git Graph extension `_ Development workflow with Gitpod diff --git a/doc/source/user_guide/merging.rst b/doc/source/user_guide/merging.rst index fb707674b4dbf..60a66f5e6f2a8 100644 --- a/doc/source/user_guide/merging.rst +++ b/doc/source/user_guide/merging.rst @@ -906,7 +906,7 @@ resetting indexes. Joining multiple :class:`DataFrame` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A list or tuple of ``:class:`DataFrame``` can also be passed to :meth:`~DataFrame.join` +A list or tuple of :class:`DataFrame` can also be passed to :meth:`~DataFrame.join` to join them together on their indexes. .. ipython:: python diff --git a/doc/source/user_guide/window.rst b/doc/source/user_guide/window.rst index 406d77d5b8caa..5b27442c80bb8 100644 --- a/doc/source/user_guide/window.rst +++ b/doc/source/user_guide/window.rst @@ -70,7 +70,8 @@ which will first group the data by the specified keys and then perform a windowi Some windowing aggregation, ``mean``, ``sum``, ``var`` and ``std`` methods may suffer from numerical imprecision due to the underlying windowing algorithms accumulating sums. When values differ - with magnitude :math:`1/np.finfo(np.double).eps` this results in truncation. It must be + with magnitude ``1/np.finfo(np.double).eps`` (approximately :math:`4.5 \times 10^{15}`), + this results in truncation. It must be noted, that large values may have an impact on windows, which do not include these values. `Kahan summation `__ is used to compute the rolling sums to preserve accuracy as much as possible. diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 32d9253326277..09134763977c3 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -37,7 +37,8 @@ Other enhancements updated to work correctly with NumPy >= 2 (:issue:`57739`) - :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`) - :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`) -- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns when backed by PyArrow (:issue:`60633`) +- The :meth:`Series.str.decode` has gained the argument ``dtype`` to control the dtype of the result (:issue:`60940`) +- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns (:issue:`60633`) - The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`) .. --------------------------------------------------------------------------- diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst index 6f67275a211e7..fa7dbaa0febed 100644 --- a/doc/source/whatsnew/v3.0.0.rst +++ b/doc/source/whatsnew/v3.0.0.rst @@ -71,6 +71,7 @@ Other enhancements - :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`) - :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`) - :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`) +- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`). - Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`) - Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`) - Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`) @@ -357,6 +358,7 @@ Other API changes - Made ``dtype`` a required argument in :meth:`ExtensionArray._from_sequence_of_strings` (:issue:`56519`) - Passing a :class:`Series` input to :func:`json_normalize` will now retain the :class:`Series` :class:`Index`, previously output had a new :class:`RangeIndex` (:issue:`51452`) - Removed :meth:`Index.sort` which always raised a ``TypeError``. This attribute is not defined and will raise an ``AttributeError`` (:issue:`59283`) +- Unused ``dtype`` argument has been removed from the :class:`MultiIndex` constructor (:issue:`60962`) - Updated :meth:`DataFrame.to_excel` so that the output spreadsheet has no styling. Custom styling can still be done using :meth:`Styler.to_excel` (:issue:`54154`) - pickle and HDF (``.h5``) files created with Python 2 are no longer explicitly supported (:issue:`57387`) - pickled objects from pandas version less than ``1.0.0`` are no longer supported (:issue:`57155`) @@ -787,6 +789,7 @@ Sparse ExtensionArray ^^^^^^^^^^^^^^ +- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`) - Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`) - Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`) - Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`) @@ -816,6 +819,7 @@ Other - Bug in :meth:`DataFrame.transform` that was returning the wrong order unless the index was monotonically increasing. (:issue:`57069`) - Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`) - Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`) +- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`) - Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`) - Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`) - Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`) diff --git a/pandas/_libs/algos.pyx b/pandas/_libs/algos.pyx index f2b4baf508986..60ee73ef6b43f 100644 --- a/pandas/_libs/algos.pyx +++ b/pandas/_libs/algos.pyx @@ -818,33 +818,7 @@ def is_monotonic(const numeric_object_t[:] arr, bint timelike): if timelike and arr[0] == NPY_NAT: return False, False, False - if numeric_object_t is not object: - with nogil: - prev = arr[0] - for i in range(1, n): - cur = arr[i] - if timelike and cur == NPY_NAT: - is_monotonic_inc = 0 - is_monotonic_dec = 0 - break - if cur < prev: - is_monotonic_inc = 0 - elif cur > prev: - is_monotonic_dec = 0 - elif cur == prev: - is_unique = 0 - else: - # cur or prev is NaN - is_monotonic_inc = 0 - is_monotonic_dec = 0 - break - if not is_monotonic_inc and not is_monotonic_dec: - is_monotonic_inc = 0 - is_monotonic_dec = 0 - break - prev = cur - else: - # object-dtype, identical to above except we cannot use `with nogil` + with nogil(numeric_object_t is not object): prev = arr[0] for i in range(1, n): cur = arr[i] diff --git a/pandas/_libs/hashtable_func_helper.pxi.in b/pandas/_libs/hashtable_func_helper.pxi.in index 5500fadb73b6d..f957ebdeaf67a 100644 --- a/pandas/_libs/hashtable_func_helper.pxi.in +++ b/pandas/_libs/hashtable_func_helper.pxi.in @@ -415,20 +415,7 @@ def mode(ndarray[htfunc_t] values, bint dropna, const uint8_t[:] mask=None): modes = np.empty(nkeys, dtype=values.dtype) - if htfunc_t is not object: - with nogil: - for k in range(nkeys): - count = counts[k] - if count == max_count: - j += 1 - elif count > max_count: - max_count = count - j = 0 - else: - continue - - modes[j] = keys[k] - else: + with nogil(htfunc_t is not object): for k in range(nkeys): count = counts[k] if count == max_count: diff --git a/pandas/_libs/internals.pyx b/pandas/_libs/internals.pyx index 99737776ff59f..4f0c2892f5a58 100644 --- a/pandas/_libs/internals.pyx +++ b/pandas/_libs/internals.pyx @@ -502,7 +502,7 @@ def get_concat_blkno_indexers(list blknos_list not None): @cython.boundscheck(False) @cython.wraparound(False) def get_blkno_indexers( - int64_t[:] blknos, bint group=True + const int64_t[:] blknos, bint group=True ) -> list[tuple[int, slice | np.ndarray]]: """ Enumerate contiguous runs of integers in ndarray. @@ -596,8 +596,8 @@ def get_blkno_placements(blknos, group: bool = True): @cython.boundscheck(False) @cython.wraparound(False) cpdef update_blklocs_and_blknos( - ndarray[intp_t, ndim=1] blklocs, - ndarray[intp_t, ndim=1] blknos, + const intp_t[:] blklocs, + const intp_t[:] blknos, Py_ssize_t loc, intp_t nblocks, ): diff --git a/pandas/_libs/join.pyx b/pandas/_libs/join.pyx index 368abe60d7237..e6bd5a8de39dc 100644 --- a/pandas/_libs/join.pyx +++ b/pandas/_libs/join.pyx @@ -225,7 +225,10 @@ def full_outer_join(const intp_t[:] left, const intp_t[:] right, @cython.wraparound(False) @cython.boundscheck(False) -cdef void _get_result_indexer(intp_t[::1] sorter, intp_t[::1] indexer) noexcept nogil: +cdef void _get_result_indexer( + const intp_t[::1] sorter, + intp_t[::1] indexer, +) noexcept nogil: """NOTE: overwrites indexer with the result to avoid allocating another array""" cdef: Py_ssize_t i, n, idx @@ -681,8 +684,8 @@ def outer_join_indexer(ndarray[numeric_object_t] left, ndarray[numeric_object_t] from pandas._libs.hashtable cimport Int64HashTable -def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values, - ndarray[numeric_t] right_values, +def asof_join_backward_on_X_by_Y(const numeric_t[:] left_values, + const numeric_t[:] right_values, const int64_t[:] left_by_values, const int64_t[:] right_by_values, bint allow_exact_matches=True, @@ -752,8 +755,8 @@ def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values, return left_indexer, right_indexer -def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values, - ndarray[numeric_t] right_values, +def asof_join_forward_on_X_by_Y(const numeric_t[:] left_values, + const numeric_t[:] right_values, const int64_t[:] left_by_values, const int64_t[:] right_by_values, bint allow_exact_matches=1, @@ -824,8 +827,8 @@ def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values, return left_indexer, right_indexer -def asof_join_nearest_on_X_by_Y(ndarray[numeric_t] left_values, - ndarray[numeric_t] right_values, +def asof_join_nearest_on_X_by_Y(const numeric_t[:] left_values, + const numeric_t[:] right_values, const int64_t[:] left_by_values, const int64_t[:] right_by_values, bint allow_exact_matches=True, diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx index fce51700d623f..3c509a3eae11a 100644 --- a/pandas/_libs/lib.pyx +++ b/pandas/_libs/lib.pyx @@ -981,16 +981,14 @@ def get_level_sorter( @cython.boundscheck(False) @cython.wraparound(False) -def count_level_2d(ndarray[uint8_t, ndim=2, cast=True] mask, +def count_level_2d(const uint8_t[:, :] mask, const intp_t[:] labels, Py_ssize_t max_bin, ): cdef: - Py_ssize_t i, j, k, n + Py_ssize_t i, j, k = mask.shape[1], n = mask.shape[0] ndarray[int64_t, ndim=2] counts - n, k = (mask).shape - counts = np.zeros((n, max_bin), dtype="i8") with nogil: for i in range(n): diff --git a/pandas/_libs/reshape.pyx b/pandas/_libs/reshape.pyx index 28ea06739e0c8..51c1c75ba631b 100644 --- a/pandas/_libs/reshape.pyx +++ b/pandas/_libs/reshape.pyx @@ -40,27 +40,7 @@ def unstack(const numeric_object_t[:, :] values, const uint8_t[:] mask, cdef: Py_ssize_t i, j, w, nulls, s, offset - if numeric_object_t is not object: - # evaluated at compile-time - with nogil: - for i in range(stride): - - nulls = 0 - for j in range(length): - - for w in range(width): - - offset = j * width + w - - if mask[offset]: - s = i * width + w - new_values[j, s] = values[offset - nulls, i] - new_mask[j, s] = 1 - else: - nulls += 1 - - else: - # object-dtype, identical to above but we cannot use nogil + with nogil(numeric_object_t is not object): for i in range(stride): nulls = 0 diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py index ae20bfb6b284b..0ce700772fdcc 100644 --- a/pandas/core/arrays/categorical.py +++ b/pandas/core/arrays/categorical.py @@ -447,7 +447,12 @@ def __init__( if isinstance(values.dtype, ArrowDtype) and issubclass( values.dtype.type, CategoricalDtypeType ): - arr = values._pa_array.combine_chunks() + from pandas import Index + + if isinstance(values, Index): + arr = values._data._pa_array.combine_chunks() + else: + arr = values._pa_array.combine_chunks() categories = arr.dictionary.to_pandas(types_mapper=ArrowDtype) codes = arr.indices.to_numpy() dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered) diff --git a/pandas/core/arrays/string_.py b/pandas/core/arrays/string_.py index 623a6a10c75b5..7227ea77ca433 100644 --- a/pandas/core/arrays/string_.py +++ b/pandas/core/arrays/string_.py @@ -49,6 +49,7 @@ ) from pandas.core import ( + missing, nanops, ops, ) @@ -870,6 +871,88 @@ def _reduce( raise TypeError(f"Cannot perform reduction '{name}' with string dtype") + def _accumulate(self, name: str, *, skipna: bool = True, **kwargs) -> StringArray: + """ + Return an ExtensionArray performing an accumulation operation. + + The underlying data type might change. + + Parameters + ---------- + name : str + Name of the function, supported values are: + - cummin + - cummax + - cumsum + - cumprod + skipna : bool, default True + If True, skip NA values. + **kwargs + Additional keyword arguments passed to the accumulation function. + Currently, there is no supported kwarg. + + Returns + ------- + array + + Raises + ------ + NotImplementedError : subclass does not define accumulations + """ + if name == "cumprod": + msg = f"operation '{name}' not supported for dtype '{self.dtype}'" + raise TypeError(msg) + + # We may need to strip out trailing NA values + tail: np.ndarray | None = None + na_mask: np.ndarray | None = None + ndarray = self._ndarray + np_func = { + "cumsum": np.cumsum, + "cummin": np.minimum.accumulate, + "cummax": np.maximum.accumulate, + }[name] + + if self._hasna: + na_mask = cast("npt.NDArray[np.bool_]", isna(ndarray)) + if np.all(na_mask): + return type(self)(ndarray) + if skipna: + if name == "cumsum": + ndarray = np.where(na_mask, "", ndarray) + else: + # We can retain the running min/max by forward/backward filling. + ndarray = ndarray.copy() + missing.pad_or_backfill_inplace( + ndarray, + method="pad", + axis=0, + ) + missing.pad_or_backfill_inplace( + ndarray, + method="backfill", + axis=0, + ) + else: + # When not skipping NA values, the result should be null from + # the first NA value onward. + idx = np.argmax(na_mask) + tail = np.empty(len(ndarray) - idx, dtype="object") + tail[:] = self.dtype.na_value + ndarray = ndarray[:idx] + + # mypy: Cannot call function of unknown type + np_result = np_func(ndarray) # type: ignore[operator] + + if tail is not None: + np_result = np.hstack((np_result, tail)) + elif na_mask is not None: + # Argument 2 to "where" has incompatible type "NAType | float" + np_result = np.where(na_mask, self.dtype.na_value, np_result) # type: ignore[arg-type] + + result = type(self)(np_result) + return result + def _wrap_reduction_result(self, axis: AxisInt | None, result) -> Any: if self.dtype.na_value is np.nan and result is libmissing.NA: # the masked_reductions use pd.NA -> convert to np.nan diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 874ab1a3c944d..ccd801e252f2c 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -2801,6 +2801,12 @@ def to_sql( Databases supported by SQLAlchemy [1]_ are supported. Tables can be newly created, appended to, or overwritten. + .. warning:: + The pandas library does not attempt to sanitize inputs provided via a to_sql call. + Please refer to the documentation for the underlying database driver to see if it + will properly prevent injection, or alternatively be advised of a security risk when + executing arbitrary commands in a to_sql call. + Parameters ---------- name : str diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py index dc48cd1ed958e..79eb1b693d866 100644 --- a/pandas/core/indexes/multi.py +++ b/pandas/core/indexes/multi.py @@ -212,8 +212,6 @@ class MultiIndex(Index): level). names : optional sequence of objects Names for each of the index levels. (name is accepted for compat). - dtype : Numpy dtype or pandas type, optional - Data type for the MultiIndex. copy : bool, default False Copy the meta-data. name : Label @@ -305,7 +303,6 @@ def __new__( codes=None, sortorder=None, names=None, - dtype=None, copy: bool = False, name=None, verify_integrity: bool = True, @@ -1760,7 +1757,7 @@ def fillna(self, value): """ fillna is not implemented for MultiIndex """ - raise NotImplementedError("isna is not defined for MultiIndex") + raise NotImplementedError("fillna is not defined for MultiIndex") @doc(Index.dropna) def dropna(self, how: AnyAll = "any") -> MultiIndex: diff --git a/pandas/core/series.py b/pandas/core/series.py index 351622135b31f..da46f8ede3409 100644 --- a/pandas/core/series.py +++ b/pandas/core/series.py @@ -4651,7 +4651,7 @@ def rename( inplace: Literal[True], level: Level | None = ..., errors: IgnoreRaise = ..., - ) -> None: ... + ) -> Series | None: ... @overload def rename( @@ -4665,18 +4665,6 @@ def rename( errors: IgnoreRaise = ..., ) -> Series: ... - @overload - def rename( - self, - index: Renamer | Hashable | None = ..., - *, - axis: Axis | None = ..., - copy: bool | lib.NoDefault = ..., - inplace: bool = ..., - level: Level | None = ..., - errors: IgnoreRaise = ..., - ) -> Series | None: ... - def rename( self, index: Renamer | Hashable | None = None, @@ -4734,8 +4722,9 @@ def rename( Returns ------- - Series or None - Series with index labels or name altered or None if ``inplace=True``. + Series + A shallow copy with index labels or name altered, or the same object + if ``inplace=True`` and index is not a dict or callable else None. See Also -------- diff --git a/pandas/core/strings/accessor.py b/pandas/core/strings/accessor.py index b854338c2d1d7..81f7441846589 100644 --- a/pandas/core/strings/accessor.py +++ b/pandas/core/strings/accessor.py @@ -34,6 +34,7 @@ is_numeric_dtype, is_object_dtype, is_re, + is_string_dtype, ) from pandas.core.dtypes.dtypes import ( ArrowDtype, @@ -2102,7 +2103,9 @@ def slice_replace(self, start=None, stop=None, repl=None): result = self._data.array._str_slice_replace(start, stop, repl) return self._wrap_result(result) - def decode(self, encoding, errors: str = "strict"): + def decode( + self, encoding, errors: str = "strict", dtype: str | DtypeObj | None = None + ): """ Decode character string in the Series/Index using indicated encoding. @@ -2116,6 +2119,12 @@ def decode(self, encoding, errors: str = "strict"): errors : str, optional Specifies the error handling scheme. Possible values are those supported by :meth:`bytes.decode`. + dtype : str or dtype, optional + The dtype of the result. When not ``None``, must be either a string or + object dtype. When ``None``, the dtype of the result is determined by + ``pd.options.future.infer_string``. + + .. versionadded:: 2.3.0 Returns ------- @@ -2137,6 +2146,10 @@ def decode(self, encoding, errors: str = "strict"): 2 () dtype: object """ + if dtype is not None and not is_string_dtype(dtype): + raise ValueError(f"dtype must be string or object, got {dtype=}") + if dtype is None and get_option("future.infer_string"): + dtype = "str" # TODO: Add a similar _bytes interface. if encoding in _cpython_optimized_decoders: # CPython optimized implementation @@ -2146,7 +2159,6 @@ def decode(self, encoding, errors: str = "strict"): f = lambda x: decoder(x, errors)[0] arr = self._data.array result = arr._str_map(f) - dtype = "str" if get_option("future.infer_string") else None return self._wrap_result(result, dtype=dtype) @forbid_nonstring_types(["bytes"]) @@ -3537,12 +3549,29 @@ def casefold(self): also includes other characters that can represent quantities such as unicode fractions. - >>> s1 = pd.Series(['one', 'one1', '1', '']) + >>> s1 = pd.Series(['one', 'one1', '1', '', '³', '⅕']) >>> s1.str.isnumeric() 0 False 1 False 2 True 3 False + 4 True + 5 True + dtype: bool + + For a string to be considered numeric, all its characters must have a Unicode + numeric property matching :py:meth:`str.is_numeric`. As a consequence, + the following cases are **not** recognized as numeric: + + - **Decimal numbers** (e.g., "1.1"): due to period ``"."`` + - **Negative numbers** (e.g., "-5"): due to minus sign ``"-"`` + - **Scientific notation** (e.g., "1e3"): due to characters like ``"e"`` + + >>> s2 = pd.Series(["1.1", "-5", "1e3"]) + >>> s2.str.isnumeric() + 0 False + 1 False + 2 False dtype: bool """ _shared_docs["isalnum"] = """ diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py index c9bea58751207..0f734a81795c4 100644 --- a/pandas/io/formats/style.py +++ b/pandas/io/formats/style.py @@ -187,6 +187,8 @@ class Styler(StylerRenderer): Attributes ---------- + index : data.index Index + columns : data.columns Index env : Jinja2 jinja2.Environment template_html : Jinja2 Template template_html_table : Jinja2 Template diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py index 5cedb41fdcb22..a689cfbcb1418 100644 --- a/pandas/io/pytables.py +++ b/pandas/io/pytables.py @@ -4159,6 +4159,8 @@ def _create_axes( ordered = data_converted.ordered meta = "category" metadata = np.asarray(data_converted.categories).ravel() + elif isinstance(blk.dtype, StringDtype): + meta = str(blk.dtype) data, dtype_name = _get_data_and_dtype_name(data_converted) @@ -4419,7 +4421,8 @@ def read_column( errors=self.errors, ) cvs = col_values[1] - return Series(cvs, name=column, copy=False) + dtype = getattr(self.table.attrs, f"{column}_meta", None) + return Series(cvs, name=column, copy=False, dtype=dtype) raise KeyError(f"column [{column}] not found in the table") @@ -4769,8 +4772,18 @@ def read( df = DataFrame._from_arrays([values], columns=cols_, index=index_) if not (using_string_dtype() and values.dtype.kind == "O"): assert (df.dtypes == values.dtype).all(), (df.dtypes, values.dtype) + + # If str / string dtype is stored in meta, use that. + converted = False + for column in cols_: + dtype = getattr(self.table.attrs, f"{column}_meta", None) + if dtype in ["str", "string"]: + df[column] = df[column].astype(dtype) + converted = True + # Otherwise try inference. if ( - using_string_dtype() + not converted + and using_string_dtype() and isinstance(values, np.ndarray) and is_string_array( values, diff --git a/pandas/io/sql.py b/pandas/io/sql.py index 8e75c61e1744d..0e0f07c0f8ff3 100644 --- a/pandas/io/sql.py +++ b/pandas/io/sql.py @@ -76,6 +76,7 @@ from sqlalchemy import Table from sqlalchemy.sql.expression import ( + Delete, Select, TextClause, ) @@ -738,7 +739,7 @@ def to_sql( name: str, con, schema: str | None = None, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", index: bool = True, index_label: IndexLabel | None = None, chunksize: int | None = None, @@ -750,6 +751,12 @@ def to_sql( """ Write records stored in a DataFrame to a SQL database. + .. warning:: + The pandas library does not attempt to sanitize inputs provided via a to_sql call. + Please refer to the documentation for the underlying database driver to see if it + will properly prevent injection, or alternatively be advised of a security risk when + executing arbitrary commands in a to_sql call. + Parameters ---------- frame : DataFrame, Series @@ -764,10 +771,11 @@ def to_sql( schema : str, optional Name of SQL schema in database to write to (if database flavor supports this). If None, use default schema (default). - if_exists : {'fail', 'replace', 'append'}, default 'fail' + if_exists : {'fail', 'replace', 'append', 'delete_rows'}, default 'fail' - fail: If table exists, do nothing. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, insert data. Create if does not exist. + - delete_rows: If a table exists, delete all records and insert data. index : bool, default True Write DataFrame index as a column. index_label : str or sequence, optional @@ -818,7 +826,7 @@ def to_sql( `sqlite3 `__ or `SQLAlchemy `__ """ # noqa: E501 - if if_exists not in ("fail", "replace", "append"): + if if_exists not in ("fail", "replace", "append", "delete_rows"): raise ValueError(f"'{if_exists}' is not valid for if_exists") if isinstance(frame, Series): @@ -926,7 +934,7 @@ def __init__( pandas_sql_engine, frame=None, index: bool | str | list[str] | None = True, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", prefix: str = "pandas", index_label=None, schema=None, @@ -974,11 +982,13 @@ def create(self) -> None: if self.exists(): if self.if_exists == "fail": raise ValueError(f"Table '{self.name}' already exists.") - if self.if_exists == "replace": + elif self.if_exists == "replace": self.pd_sql.drop_table(self.name, self.schema) self._execute_create() elif self.if_exists == "append": pass + elif self.if_exists == "delete_rows": + self.pd_sql.delete_rows(self.name, self.schema) else: raise ValueError(f"'{self.if_exists}' is not valid for if_exists") else: @@ -997,7 +1007,7 @@ def _execute_insert(self, conn, keys: list[str], data_iter) -> int: Each item contains a list of values to be inserted """ data = [dict(zip(keys, row)) for row in data_iter] - result = conn.execute(self.table.insert(), data) + result = self.pd_sql.execute(self.table.insert(), data) return result.rowcount def _execute_insert_multi(self, conn, keys: list[str], data_iter) -> int: @@ -1014,7 +1024,7 @@ def _execute_insert_multi(self, conn, keys: list[str], data_iter) -> int: data = [dict(zip(keys, row)) for row in data_iter] stmt = insert(self.table).values(data) - result = conn.execute(stmt) + result = self.pd_sql.execute(stmt) return result.rowcount def insert_data(self) -> tuple[list[str], list[np.ndarray]]: @@ -1480,7 +1490,7 @@ def to_sql( self, frame, name: str, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", index: bool = True, index_label=None, schema=None, @@ -1649,7 +1659,7 @@ def run_transaction(self): else: yield self.con - def execute(self, sql: str | Select | TextClause, params=None): + def execute(self, sql: str | Select | TextClause | Delete, params=None): """Simple passthrough to SQLAlchemy connectable""" from sqlalchemy.exc import SQLAlchemyError @@ -1874,7 +1884,7 @@ def prep_table( self, frame, name: str, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", index: bool | str | list[str] | None = True, index_label=None, schema=None, @@ -1951,7 +1961,7 @@ def to_sql( self, frame, name: str, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", index: bool = True, index_label=None, schema: str | None = None, @@ -1969,10 +1979,11 @@ def to_sql( frame : DataFrame name : string Name of SQL table. - if_exists : {'fail', 'replace', 'append'}, default 'fail' + if_exists : {'fail', 'replace', 'append', 'delete_rows'}, default 'fail' - fail: If table exists, do nothing. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, insert data. Create if does not exist. + - delete_rows: If a table exists, delete all records and insert data. index : boolean, default True Write DataFrame index as a column. index_label : string or sequence, default None @@ -2069,6 +2080,16 @@ def drop_table(self, table_name: str, schema: str | None = None) -> None: self.get_table(table_name, schema).drop(bind=self.con) self.meta.clear() + def delete_rows(self, table_name: str, schema: str | None = None) -> None: + schema = schema or self.meta.schema + if self.has_table(table_name, schema): + self.meta.reflect( + bind=self.con, only=[table_name], schema=schema, views=True + ) + table = self.get_table(table_name, schema) + self.execute(table.delete()).close() + self.meta.clear() + def _create_sql_schema( self, frame: DataFrame, @@ -2304,7 +2325,7 @@ def to_sql( self, frame, name: str, - if_exists: Literal["fail", "replace", "append"] = "fail", + if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail", index: bool = True, index_label=None, schema: str | None = None, @@ -2326,6 +2347,7 @@ def to_sql( - fail: If table exists, do nothing. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, insert data. Create if does not exist. + - delete_rows: If a table exists, delete all records and insert data. index : boolean, default True Write DataFrame index as a column. index_label : string or sequence, default None @@ -2379,6 +2401,9 @@ def to_sql( self.execute(sql_statement).close() elif if_exists == "append": mode = "append" + elif if_exists == "delete_rows": + mode = "append" + self.delete_rows(name, schema) try: tbl = pa.Table.from_pandas(frame, preserve_index=index) @@ -2416,6 +2441,11 @@ def has_table(self, name: str, schema: str | None = None) -> bool: return False + def delete_rows(self, name: str, schema: str | None = None) -> None: + table_name = f"{schema}.{name}" if schema else name + if self.has_table(name, schema): + self.execute(f"DELETE FROM {table_name}").close() + def _create_sql_schema( self, frame: DataFrame, @@ -2790,10 +2820,11 @@ def to_sql( frame: DataFrame name: string Name of SQL table. - if_exists: {'fail', 'replace', 'append'}, default 'fail' + if_exists: {'fail', 'replace', 'append', 'delete_rows'}, default 'fail' fail: If table exists, do nothing. replace: If table exists, drop it, recreate it, and insert data. append: If table exists, insert data. Create if it does not exist. + delete_rows: If a table exists, delete all records and insert data. index : bool, default True Write DataFrame index as a column index_label : string or sequence, default None @@ -2867,7 +2898,12 @@ def get_table(self, table_name: str, schema: str | None = None) -> None: def drop_table(self, name: str, schema: str | None = None) -> None: drop_sql = f"DROP TABLE {_get_valid_sqlite_name(name)}" - self.execute(drop_sql) + self.execute(drop_sql).close() + + def delete_rows(self, name: str, schema: str | None = None) -> None: + delete_sql = f"DELETE FROM {_get_valid_sqlite_name(name)}" + if self.has_table(name, schema): + self.execute(delete_sql).close() def _create_sql_schema( self, diff --git a/pandas/tests/apply/test_str.py b/pandas/tests/apply/test_str.py index ce71cfec535e4..e5a9492630b13 100644 --- a/pandas/tests/apply/test_str.py +++ b/pandas/tests/apply/test_str.py @@ -5,7 +5,6 @@ import pytest from pandas.compat import ( - HAS_PYARROW, WASM, ) @@ -162,17 +161,10 @@ def test_agg_cython_table_series(series, func, expected): ), ), ) -def test_agg_cython_table_transform_series(request, series, func, expected): +def test_agg_cython_table_transform_series(series, func, expected): # GH21224 # test transforming functions in # pandas.core.base.SelectionMixin._cython_table (cumprod, cumsum) - if series.dtype == "string" and func == "cumsum" and not HAS_PYARROW: - request.applymarker( - pytest.mark.xfail( - raises=NotImplementedError, - reason="TODO(infer_string) cumsum not yet implemented for string", - ) - ) warn = None if isinstance(func, str) else FutureWarning with tm.assert_produces_warning(warn, match="is currently using Series.*"): result = series.agg(func) diff --git a/pandas/tests/base/test_fillna.py b/pandas/tests/base/test_fillna.py index 7300d3013305a..8c56bcc169d8e 100644 --- a/pandas/tests/base/test_fillna.py +++ b/pandas/tests/base/test_fillna.py @@ -16,7 +16,7 @@ def test_fillna(index_or_series_obj): obj = index_or_series_obj if isinstance(obj, MultiIndex): - msg = "isna is not defined for MultiIndex" + msg = "fillna is not defined for MultiIndex" with pytest.raises(NotImplementedError, match=msg): obj.fillna(0) return diff --git a/pandas/tests/extension/test_arrow.py b/pandas/tests/extension/test_arrow.py index d6f428f4938a6..f4a63ff4c92ec 100644 --- a/pandas/tests/extension/test_arrow.py +++ b/pandas/tests/extension/test_arrow.py @@ -3511,3 +3511,20 @@ def test_map_numeric_na_action(): result = ser.map(lambda x: 42, na_action="ignore") expected = pd.Series([42.0, 42.0, np.nan], dtype="float64") tm.assert_series_equal(result, expected) + + +def test_categorical_from_arrow_dictionary(): + # GH 60563 + df = pd.DataFrame( + {"A": ["a1", "a2"]}, dtype=ArrowDtype(pa.dictionary(pa.int32(), pa.utf8())) + ) + result = df.value_counts(dropna=False) + expected = pd.Series( + [1, 1], + index=pd.MultiIndex.from_arrays( + [pd.Index(["a1", "a2"], dtype=ArrowDtype(pa.string()), name="A")] + ), + name="count", + dtype="int64", + ) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/extension/test_string.py b/pandas/tests/extension/test_string.py index 6ce48e434d329..25129111180d6 100644 --- a/pandas/tests/extension/test_string.py +++ b/pandas/tests/extension/test_string.py @@ -196,11 +196,7 @@ def _supports_reduction(self, ser: pd.Series, op_name: str) -> bool: def _supports_accumulation(self, ser: pd.Series, op_name: str) -> bool: assert isinstance(ser.dtype, StorageExtensionDtype) - return ser.dtype.storage == "pyarrow" and op_name in [ - "cummin", - "cummax", - "cumsum", - ] + return op_name in ["cummin", "cummax", "cumsum"] def _cast_pointwise_result(self, op_name: str, obj, other, pointwise_result): dtype = cast(StringDtype, tm.get_dtype(obj)) diff --git a/pandas/tests/indexes/multi/test_missing.py b/pandas/tests/indexes/multi/test_missing.py index 14ffc42fb4b59..41cfa093ae53c 100644 --- a/pandas/tests/indexes/multi/test_missing.py +++ b/pandas/tests/indexes/multi/test_missing.py @@ -8,7 +8,7 @@ def test_fillna(idx): # GH 11343 - msg = "isna is not defined for MultiIndex" + msg = "fillna is not defined for MultiIndex" with pytest.raises(NotImplementedError, match=msg): idx.fillna(idx[0]) diff --git a/pandas/tests/indexes/test_old_base.py b/pandas/tests/indexes/test_old_base.py index 49609d28ca56e..5f36b8c3f5dbf 100644 --- a/pandas/tests/indexes/test_old_base.py +++ b/pandas/tests/indexes/test_old_base.py @@ -597,7 +597,7 @@ def test_fillna(self, index): pytest.skip(f"Not relevant for Index with {index.dtype}") elif isinstance(index, MultiIndex): idx = index.copy(deep=True) - msg = "isna is not defined for MultiIndex" + msg = "fillna is not defined for MultiIndex" with pytest.raises(NotImplementedError, match=msg): idx.fillna(idx[0]) else: diff --git a/pandas/tests/indexing/multiindex/test_loc.py b/pandas/tests/indexing/multiindex/test_loc.py index ec9767aa4bab4..1d3258ab18a61 100644 --- a/pandas/tests/indexing/multiindex/test_loc.py +++ b/pandas/tests/indexing/multiindex/test_loc.py @@ -105,7 +105,7 @@ def test_loc_getitem_series(self): empty = Series(data=[], dtype=np.float64) expected = Series( [], - index=MultiIndex(levels=index.levels, codes=[[], []], dtype=np.float64), + index=MultiIndex(levels=index.levels, codes=[[], []]), dtype=np.float64, ) result = x.loc[empty] @@ -129,7 +129,7 @@ def test_loc_getitem_array(self): empty = np.array([]) expected = Series( [], - index=MultiIndex(levels=index.levels, codes=[[], []], dtype=np.float64), + index=MultiIndex(levels=index.levels, codes=[[], []]), dtype="float64", ) result = x.loc[empty] diff --git a/pandas/tests/io/pytables/test_append.py b/pandas/tests/io/pytables/test_append.py index 55fdbf1ca2ea5..479f2468a86ab 100644 --- a/pandas/tests/io/pytables/test_append.py +++ b/pandas/tests/io/pytables/test_append.py @@ -5,8 +5,6 @@ import numpy as np import pytest -from pandas._config import using_string_dtype - from pandas._libs.tslibs import Timestamp from pandas.compat import PY312 @@ -516,7 +514,6 @@ def test_append_with_empty_string(setup_path): tm.assert_frame_equal(store.select("df"), df) -@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") def test_append_with_data_columns(setup_path): with ensure_clean_store(setup_path) as store: df = DataFrame( diff --git a/pandas/tests/io/pytables/test_categorical.py b/pandas/tests/io/pytables/test_categorical.py index 2f8c37c0b3876..ed2616b24cd71 100644 --- a/pandas/tests/io/pytables/test_categorical.py +++ b/pandas/tests/io/pytables/test_categorical.py @@ -1,8 +1,6 @@ import numpy as np import pytest -from pandas._config import using_string_dtype - from pandas import ( Categorical, DataFrame, @@ -140,7 +138,6 @@ def test_categorical(setup_path): store.select("df3/meta/s/meta") -@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") def test_categorical_conversion(tmp_path, setup_path): # GH13322 # Check that read_hdf with categorical columns doesn't return rows if diff --git a/pandas/tests/io/pytables/test_read.py b/pandas/tests/io/pytables/test_read.py index ed4f523a21b1e..7a3a7339e7809 100644 --- a/pandas/tests/io/pytables/test_read.py +++ b/pandas/tests/io/pytables/test_read.py @@ -5,8 +5,6 @@ import numpy as np import pytest -from pandas._config import using_string_dtype - from pandas.compat import is_platform_windows import pandas as pd @@ -72,7 +70,6 @@ def test_read_missing_key_opened_store(tmp_path, setup_path): read_hdf(store, "k1") -@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") def test_read_column(setup_path): df = DataFrame( np.random.default_rng(2).standard_normal((10, 4)), diff --git a/pandas/tests/io/pytables/test_select.py b/pandas/tests/io/pytables/test_select.py index 28af76f561356..5e76aae28c147 100644 --- a/pandas/tests/io/pytables/test_select.py +++ b/pandas/tests/io/pytables/test_select.py @@ -1,8 +1,6 @@ import numpy as np import pytest -from pandas._config import using_string_dtype - from pandas._libs.tslibs import Timestamp from pandas.compat import PY312 @@ -666,7 +664,6 @@ def test_frame_select(setup_path, request): # store.select('frame', [crit1, crit2]) -@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") def test_frame_select_complex(setup_path): # select via complex criteria @@ -980,7 +977,6 @@ def test_query_long_float_literal(setup_path): tm.assert_frame_equal(expected, result) -@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") def test_query_compare_column_type(setup_path): # GH 15492 df = DataFrame( diff --git a/pandas/tests/io/test_sql.py b/pandas/tests/io/test_sql.py index 7e1220ecee218..97c856d3b6c40 100644 --- a/pandas/tests/io/test_sql.py +++ b/pandas/tests/io/test_sql.py @@ -1068,7 +1068,9 @@ def test_to_sql(conn, method, test_frame1, request): @pytest.mark.parametrize("conn", all_connectable) -@pytest.mark.parametrize("mode, num_row_coef", [("replace", 1), ("append", 2)]) +@pytest.mark.parametrize( + "mode, num_row_coef", [("replace", 1), ("append", 2), ("delete_rows", 1)] +) def test_to_sql_exist(conn, mode, num_row_coef, test_frame1, request): conn = request.getfixturevalue(conn) with pandasSQL_builder(conn, need_transaction=True) as pandasSQL: @@ -2698,6 +2700,58 @@ def test_drop_table(conn, request): assert not insp.has_table("temp_frame") +@pytest.mark.parametrize("conn_name", all_connectable) +def test_delete_rows_success(conn_name, test_frame1, request): + table_name = "temp_delete_rows_frame" + conn = request.getfixturevalue(conn_name) + + with pandasSQL_builder(conn) as pandasSQL: + with pandasSQL.run_transaction(): + assert pandasSQL.to_sql(test_frame1, table_name) == test_frame1.shape[0] + + with pandasSQL.run_transaction(): + assert pandasSQL.delete_rows(table_name) is None + + assert count_rows(conn, table_name) == 0 + assert pandasSQL.has_table(table_name) + + +@pytest.mark.parametrize("conn_name", all_connectable) +def test_delete_rows_is_atomic(conn_name, request): + sqlalchemy = pytest.importorskip("sqlalchemy") + + table_name = "temp_delete_rows_atomic_frame" + table_stmt = f"CREATE TABLE {table_name} (a INTEGER, b INTEGER UNIQUE NOT NULL)" + + if conn_name != "sqlite_buildin" and "adbc" not in conn_name: + table_stmt = sqlalchemy.text(table_stmt) + + # setting dtype is mandatory for adbc related tests + original_df = DataFrame({"a": [1, 2], "b": [3, 4]}, dtype="int32") + replacing_df = DataFrame({"a": [5, 6, 7], "b": [8, 8, 8]}, dtype="int32") + + conn = request.getfixturevalue(conn_name) + pandasSQL = pandasSQL_builder(conn) + + with pandasSQL.run_transaction() as cur: + cur.execute(table_stmt) + + with pandasSQL.run_transaction(): + pandasSQL.to_sql(original_df, table_name, if_exists="append", index=False) + + # inserting duplicated values in a UNIQUE constraint column + with pytest.raises(pd.errors.DatabaseError): + with pandasSQL.run_transaction(): + pandasSQL.to_sql( + replacing_df, table_name, if_exists="delete_rows", index=False + ) + + # failed "delete_rows" is rolled back preserving original data + with pandasSQL.run_transaction(): + result_df = pandasSQL.read_query(f"SELECT * FROM {table_name}", dtype="int32") + tm.assert_frame_equal(result_df, original_df) + + @pytest.mark.parametrize("conn", all_connectable) def test_roundtrip(conn, request, test_frame1): if conn == "sqlite_str": @@ -3409,8 +3463,8 @@ def test_to_sql_with_negative_npinf(conn, request, input): mark = pytest.mark.xfail(reason="GH 36465") request.applymarker(mark) - msg = "inf cannot be used with MySQL" - with pytest.raises(ValueError, match=msg): + msg = "Execution failed on sql" + with pytest.raises(pd.errors.DatabaseError, match=msg): df.to_sql(name="foobar", con=conn, index=False) else: assert df.to_sql(name="foobar", con=conn, index=False) == 1 diff --git a/pandas/tests/series/test_cumulative.py b/pandas/tests/series/test_cumulative.py index 89882d9d797c5..db83cf1112e74 100644 --- a/pandas/tests/series/test_cumulative.py +++ b/pandas/tests/series/test_cumulative.py @@ -265,13 +265,14 @@ def test_cumprod_timedelta(self): ([pd.NA, pd.NA, pd.NA], "cummax", False, [pd.NA, pd.NA, pd.NA]), ], ) - def test_cum_methods_pyarrow_strings( - self, pyarrow_string_dtype, data, op, skipna, expected_data + def test_cum_methods_ea_strings( + self, string_dtype_no_object, data, op, skipna, expected_data ): - # https://github.com/pandas-dev/pandas/pull/60633 - ser = pd.Series(data, dtype=pyarrow_string_dtype) + # https://github.com/pandas-dev/pandas/pull/60633 - pyarrow + # https://github.com/pandas-dev/pandas/pull/60938 - Python + ser = pd.Series(data, dtype=string_dtype_no_object) method = getattr(ser, op) - expected = pd.Series(expected_data, dtype=pyarrow_string_dtype) + expected = pd.Series(expected_data, dtype=string_dtype_no_object) result = method(skipna=skipna) tm.assert_series_equal(result, expected) diff --git a/pandas/tests/strings/test_strings.py b/pandas/tests/strings/test_strings.py index ee531b32aa82d..025f837982595 100644 --- a/pandas/tests/strings/test_strings.py +++ b/pandas/tests/strings/test_strings.py @@ -601,6 +601,30 @@ def test_decode_errors_kwarg(): tm.assert_series_equal(result, expected) +def test_decode_string_dtype(string_dtype): + # https://github.com/pandas-dev/pandas/pull/60940 + ser = Series([b"a", b"b"]) + result = ser.str.decode("utf-8", dtype=string_dtype) + expected = Series(["a", "b"], dtype=string_dtype) + tm.assert_series_equal(result, expected) + + +def test_decode_object_dtype(object_dtype): + # https://github.com/pandas-dev/pandas/pull/60940 + ser = Series([b"a", rb"\ud800"]) + result = ser.str.decode("utf-8", dtype=object_dtype) + expected = Series(["a", r"\ud800"], dtype=object_dtype) + tm.assert_series_equal(result, expected) + + +def test_decode_bad_dtype(): + # https://github.com/pandas-dev/pandas/pull/60940 + ser = Series([b"a", b"b"]) + msg = "dtype must be string or object, got dtype='int64'" + with pytest.raises(ValueError, match=msg): + ser.str.decode("utf-8", dtype="int64") + + @pytest.mark.parametrize( "form, expected", [ diff --git a/web/pandas/about/team.md b/web/pandas/about/team.md index b66e134fa5b2f..7a19fd7af6595 100644 --- a/web/pandas/about/team.md +++ b/web/pandas/about/team.md @@ -41,8 +41,6 @@ If you want to support pandas development, you can find information in the [dona ## Governance -Wes McKinney is the Benevolent Dictator for Life (BDFL). - The project governance is available in the [project governance page]({{ base_url }}about/governance.html). ## Workgroups diff --git a/web/pandas/config.yml b/web/pandas/config.yml index a49aadd45204a..41ba581852dd1 100644 --- a/web/pandas/config.yml +++ b/web/pandas/config.yml @@ -132,8 +132,7 @@ workgroups: contact: infrastructure@pandas.pydata.org responsibilities: "Keep the pandas infrastructure up and working. In particular the servers for the website, benchmarks, CI and others needed." members: - - Marc Garcia - - Matthew Roeschke + - William Ayd - Thomas Li communications: name: Communications @@ -141,7 +140,6 @@ workgroups: responsibilities: "Share relevant information with the broader community, mainly via our social networks, as well as being the main point of contact between NumFOCUS and the core team." members: - Marco Gorelli - - Marc Garcia sponsors: active: - name: "NumFOCUS"