-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH(string dtype): Implement cumsum for Python-backed strings #60938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
pandas/core/arrays/string_.py
Outdated
# We can retain the running min/max by forward/backward filling. | ||
ndarray = ndarray.copy() | ||
missing.pad_or_backfill_inplace( | ||
ndarray.T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the .T
needed? (I would think that ndarray is 1D)
pandas/core/arrays/string_.py
Outdated
# the first NA value onward. | ||
idx = np.argmax(na_mask) | ||
tail = np.empty(len(ndarray) - idx, dtype="object") | ||
tail[:] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail[:] = np.nan | |
tail[:] = self.dtype.na_value |
So we directly fill it with the appropriate NA value (although I assume the constructor would fix it up anyway)
pandas/core/arrays/string_.py
Outdated
if tail is not None: | ||
np_result = np.hstack((np_result, tail)) | ||
elif na_mask is not None: | ||
np_result = np.where(na_mask, np.nan, np_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np_result = np.where(na_mask, np.nan, np_result) | |
np_result = np.where(na_mask, self.dtype.na_value, np_result) |
Thanks @rhshadrach |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
… Python-backed strings
Backport PR: #60984 |
…cked strings (#60984) * ENH: Improved error message and raise new error for small-string NaN edge case in HDFStore.append (#60829) * Add clearer error messages for datatype mismatch in HDFStore.append. Raise ValueError when nan_rep too large for pytable column. Add and modify applicable test code. * Fix missed tests and correct mistake in error message. * Remove excess comments. Reverse error type change to avoid api changes. Move nan_rep tests into separate function. (cherry picked from commit 57340ec) * TST(string dtype): Resolve xfails in pytables (#60795) (cherry picked from commit 4511251) * BUG(string dtype): Resolve pytables xfail when reading with condition (#60943) (cherry picked from commit 0ec5f26) * Backport PR #60940: ENH: Add dtype argument to str.decode * Backport PR #60938: ENH(string dtype): Implement cumsum for Python-backed strings --------- Co-authored-by: Jake Thomas Trevallion <136272202+JakeTT404@users.noreply.github.com>
Follow-up on #60633