-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] [NFC] Fix docstrings for string.strip()
function family
#3843
Conversation
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
+1, knowing this doesn't handle all the various whitespace-like characters in unicode is important information. Does it make sense to have |
@owenhilyard We could add such a helper function inside Both the vectorized and scalar versions are quite trivial to implement so I think people would reimplement them anyway. I was also thinking that one day we might implement (or an external library for that matter) an |
@martinvuyk Given that we're likely going to eventually have hand-written versions of functions like this per arch, I wonder if it makes sense to have small and large string versions of the code, instead of trying to make one algorithm work for someone who just mmaped a 20 GB csv and someone else who wants to split a single line of 4 comma delimited floats. We can try to make a set of mostly branchless loops for various string sizes, and potentially different versions for SIMD implementations with masking like AVX-512 and SVE. For instance, on ASCII text you could vectorize each individual compare and OR them together (ex: '\t' <= chunk <= '\r' or chunk == ' ') to make a mask, invert the mask, then use |
@owenhilyard For very large mmaped files, I was thinking of eventually implementing a lazy version of the algorithm that works on The main problem of making this vectorized is the unicode separators (and the damned
Wow, yeah that might be useful for csv parsers that know that the separators can be any ASCII whitespace char. The |
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
Landed in 9930b67! Thank you for your contribution 🎉 |
…n family (#52540) [External] [stdlib] [NFC] Fix docstrings for `string.strip()` function family Fix docstrings for `string.strip()` function family. Was deleted in b0094d3. @lsh I really don't want that documentation deleted, this is very often not included and caused me quite a couple of headaches in finding the correct information and backward compatibility reasons for using those characters. And we should be explicit that this method does not yet support full Python spaces until we can run the methods with `.isspace()` at compile time. Co-authored-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com> Closes #3843 MODULAR_ORIG_COMMIT_REV_ID: 95456c663bc28381d8c9fc1dd98c4d03d38d0ca4
…n family (#52540) [External] [stdlib] [NFC] Fix docstrings for `string.strip()` function family Fix docstrings for `string.strip()` function family. Was deleted in b0094d3. @lsh I really don't want that documentation deleted, this is very often not included and caused me quite a couple of headaches in finding the correct information and backward compatibility reasons for using those characters. And we should be explicit that this method does not yet support full Python spaces until we can run the methods with `.isspace()` at compile time. Co-authored-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com> Closes #3843 MODULAR_ORIG_COMMIT_REV_ID: 95456c663bc28381d8c9fc1dd98c4d03d38d0ca4
Fix docstrings for
string.strip()
function family.Was deleted in b0094d3. @lsh I really don't want that documentation deleted, this is very often not included and caused me quite a couple of headaches in finding the correct information and backward compatibility reasons for using those characters. And we should be explicit that this method does not yet support full Python spaces until we can run the methods with
.isspace()
at compile time.