Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for masked loads & stores #399
Add support for masked loads & stores #399
Changes from all commits
4f0ba1a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a
from_usize
and doing something like this to avoid casts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of the
case!
logic is picking the minimum viable element size, since that's usually more efficient, so it would pick 16-bit elements forSimd<u8, 256>
and 32-bit elements forSimd<u8, 65536>
(yes, afaict that is theoretically possible in a single vector on RISC-V V).your suggestion would just pick the mask's element size and if that isn't big enough, give up and use full
usize
elements even if 16-bit elements would suffice for all realistic vector types. Unfortunately LLVM doesn't seem to be able to optimize vectors to use a smaller element size unless you explicitly cast to that size before usingsimd_lt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also
lane_indices::<M, N>().simd_lt(Simd::splat(M::from_usize(len)))
is just plain incorrect, ifM
isi8
andlen
is256
then it will wrap around (or panic iffrom_usize
checks) and act as iflen == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the usual case (vectors with well under 256 elements) codegen is likely to be better without a cast? How about keeping as is, but trying to use M first if it fits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? the casts optimized away completely when I tried...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I've had bad luck with casts, but that's usually with more complex operations. If we think the casts won't matter it's fine with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be sufficient to consider only value of N when determine the element size? i.e. use 8-bit elements for
Simd<u64,16>
e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because iirc LLVM doesn't optimize-out the element-size conversions then.