Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_win64 ABI: do not use xmm0 with softfloat ABI #137094

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Feb 15, 2025

This adjusts #134290 to not apply the new logic to targets marked as "softfloat". That fixes most instances of the issue brought up here.

r? @tgross35

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 15, 2025
@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch from f91c8f4 to aa699ee Compare February 16, 2025 19:06
@RalfJung
Copy link
Member Author

I have extended the tests/codegen/i128-x86-callconv.rs test to cover the uefi target as well, as a representative softfloat target using the Windows ABI.

@tgross35
Copy link
Contributor

Thanks, I didn't realize this code path is also used for softfloat targets.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Feb 17, 2025

📌 Commit aa699ee has been approved by tgross35

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 17, 2025
@tgross35
Copy link
Contributor

@bors r-

Per rust-lang/compiler-builtins#758, it seems like it would be better to return i128 in registers since that is what LLVM expects for builtins. Looks like this can be done by skipping make_indirect and just returning i128 on that target.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 17, 2025
@tgross35
Copy link
Contributor

That solution still isn't very clean because LLVM's use of a register pair doesn't seem to match the Windows CC:

A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. [...] To return a user-defined type by value in RAX, it must have a length of 1, 2, 4, 8, 16, 32, or 64 bits. [...] Otherwise, the caller must allocate memory for the return value and pass a pointer to it as the first argument

However, doing a hack-ish fixup on our side seems preferable in order to unblock work on #116558, with the expectation that #132570 will remove that requirement. LLVM probably should also fix the i128 return convention for Windows without SSE (looks like windows without sse2 was supported until mid windows 7?) but I don't think we need to wait for that.

@RalfJung
Copy link
Member Author

Per rust-lang/compiler-builtins#758, it seems like it would be better to return i128 in registers since that is what LLVM expects for builtins. Looks like this can be done by skipping make_indirect and just returning i128 on that target.

So are you saying I should do this for all ABIs, or just the softfloat ABI?

That solution still isn't very clean because LLVM's use of a register pair doesn't seem to match the Windows CC:

"Not very clean" or "quite unsound"? If the ABIs are different it'd be more like the latter, no?

@RalfJung
Copy link
Member Author

Hm, is it expected that f128 and i128 are handled very differently as arguments? We generate a native LLVM IR argument for f128 but explicitly pass-by-pointer for i128/u128.

@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch from aa699ee to 778ec18 Compare February 18, 2025 09:59
@RalfJung
Copy link
Member Author

RalfJung commented Feb 18, 2025

So, like this I guess? Not sure how to test this, I guess one has to look at the assembly to see what LLVM does with this but assembly is all gibberish to me.^^
@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 18, 2025
@tgross35
Copy link
Contributor

So are you saying I should do this for all ABIs, or just the softfloat ABI?

Just for softfloat, we still need to make sure it uses the xmm0 return for hardfloat.

That solution still isn't very clean because LLVM's use of a register pair doesn't seem to match the Windows CC:

"Not very clean" or "quite unsound"? If the ABIs are different it'd be more like the latter, no?

It has the same soundness issue as other x86 targets if code with and without sse is mixed. I don’t think there is any compatibility concern within the targets, Clang seems to reject __int128 for i586 Windows and it returns in two registers for UEFI (not at a computer so I can’t post the godbolt links). There isn’t anything to be compatible with from the Microsoft side, but indirect returning seems like a more faithful read of the calling convention.

I haven’t confirmed this but I think our i128 is incompatible with __int128 on UEFI and i586 Windows, since Clang seems to return it as a register pair (which LLVM also expects for calls to builtins) but we return indirectly. So matching that seems reasonable until/unless LLVM changes to return indirectly.

I am unsure if there is a good reference for what the UEFI calling convention for i128 would be in an ideal world or where an ABI spec comes from. @nicholasbishop do you have any idea?

@RalfJung
Copy link
Member Author

Just for softfloat, we still need to make sure it uses the xmm0 return for hardfloat.

Didn't you say in rust-lang/compiler-builtins#758 that LLVM is now doing this right by itself?

@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch from 778ec18 to ae0d314 Compare February 18, 2025 10:17
@tgross35
Copy link
Contributor

Iirc something about using LLVM’s i128 type didn’t produce the correct results. I’ll double check this and get some better reproductions.

@RalfJung
Copy link
Member Author

I built this code for various different targets with this PR:

#[no_mangle]
pub unsafe extern "C" fn ret_u128(x: &u128) -> u128 {
    *x
}

#[repr(simd)]
pub struct U64x2([u64; 2]);

impl Copy for U64x2 {}

#[no_mangle]
pub unsafe extern "C" fn ret_simd(x: &U64x2) -> U64x2 {
    *x
}

#[no_mangle]
pub unsafe extern "C" fn do_div(x: &u128, y: &u128, z: &mut u128) {
    *z = *x / *y;
}

Here's what I got:

  • UEFI:
ret_u128:
	movq	(%rcx), %rax
	movq	8(%rcx), %rdx
	retq

	.def	ret_simd;
	.scl	2;
	.type	32;
	.endef
	.section	.text,"xr",one_only,ret_simd
	.globl	ret_simd
	.p2align	4, 0x90
ret_simd:
	movq	(%rcx), %rax
	movq	8(%rcx), %rdx
	retq

	.def	do_div;
	.scl	2;
	.type	32;
	.endef
	.section	.text,"xr",one_only,do_div
	.globl	do_div
	.p2align	4, 0x90
do_div:
	pushq	%rsi
	subq	$64, %rsp
	movq	%r8, %rsi
	movq	(%rdx), %r8
	movq	8(%rdx), %rax
	movq	%r8, %rdx
	orq	%rax, %rdx
	je	.LBB2_2
	movq	(%rcx), %rdx
	movq	8(%rcx), %rcx
	movq	%r8, 32(%rsp)
	movq	%rcx, 56(%rsp)
	movq	%rdx, 48(%rsp)
	movq	%rax, 40(%rsp)
	leaq	48(%rsp), %rcx
	leaq	32(%rsp), %rdx
	callq	__udivti3
	movq	%rax, (%rsi)
	movq	%rdx, 8(%rsi)
	addq	$64, %rsp
	popq	%rsi
	retq
	.p2align	4, 0x90
.LBB2_2:
	jmp	.LBB2_2
  • MSVC:
ret_u128:
	movq	(%rcx), %rax
	movq	8(%rcx), %rdx
	retq

	.def	ret_simd;
	.scl	2;
	.type	32;
	.endef
	.section	.text,"xr",one_only,ret_simd
	.globl	ret_simd
	.p2align	4, 0x90
ret_simd:
	movaps	(%rcx), %xmm0
	retq

	.def	do_div;
	.scl	2;
	.type	32;
	.endef
	.section	.text,"xr",one_only,do_div
	.globl	do_div
	.p2align	4, 0x90
do_div:
.seh_proc do_div
	pushq	%rsi
	.seh_pushreg %rsi
	subq	$64, %rsp
	.seh_stackalloc 64
	.seh_endprologue
	movq	%r8, %rsi
	movq	(%rdx), %r8
	movq	8(%rdx), %rax
	movq	%r8, %rdx
	orq	%rax, %rdx
	je	.LBB2_2
	movaps	(%rcx), %xmm0
	movq	%r8, 32(%rsp)
	movaps	%xmm0, 48(%rsp)
	movq	%rax, 40(%rsp)
	leaq	48(%rsp), %rcx
	leaq	32(%rsp), %rdx
	callq	__udivti3
	movaps	%xmm0, (%rsi)
	addq	$64, %rsp
	popq	%rsi
	retq
	.p2align	4, 0x90
.LBB2_2:
	jmp	.LBB2_2
	.seh_endproc
  • GNU:
ret_u128:
	movq	(%rcx), %rax
	movq	8(%rcx), %rdx
	retq

	.def	ret_simd;
	.scl	2;
	.type	32;
	.endef
	.globl	ret_simd
	.p2align	4, 0x90
ret_simd:
	movaps	(%rcx), %xmm0
	retq

	.def	do_div;
	.scl	2;
	.type	32;
	.endef
	.globl	do_div
	.p2align	4, 0x90
do_div:
.seh_proc do_div
	pushq	%rsi
	.seh_pushreg %rsi
	subq	$64, %rsp
	.seh_stackalloc 64
	.seh_endprologue
	movq	%r8, %rsi
	movq	(%rdx), %r8
	movq	8(%rdx), %rax
	movq	%r8, %rdx
	orq	%rax, %rdx
	je	.LBB2_2
	movaps	(%rcx), %xmm0
	movq	%r8, 32(%rsp)
	movaps	%xmm0, 48(%rsp)
	movq	%rax, 40(%rsp)
	leaq	48(%rsp), %rcx
	leaq	32(%rsp), %rdx
	callq	__udivti3
	movaps	%xmm0, (%rsi)
	addq	$64, %rsp
	popq	%rsi
	retq
	.p2align	4, 0x90
.LBB2_2:
	jmp	.LBB2_2
	.seh_endproc

Does that look like it's using the right registers or not?

I don't see xmm0 in ret_u128 so I guess the answer is "no". We should probably have an asm test for that...

@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch 2 times, most recently from 9697b34 to 0b78678 Compare February 18, 2025 13:25
@RalfJung
Copy link
Member Author

Just for softfloat, we still need to make sure it uses the xmm0 return for hardfloat.

I did that now, and also added an assembly test to ensure this does not regress. I guess this snippet means that for softfloat, when LLVM invokes the intrinsic, rax and rdx are the right registers:

	callq	__udivti3
	movq	%rax, (%rsi)
	movq	%rdx, 8(%rsi)

@RalfJung
Copy link
Member Author

Isn't it an LLVM bug that an intrinsic that returns i128 is invoked using a different ABI than what LLVM itself generated when a function returns i128? Like, LLVM doesn't agree with itself on the ABI here...

@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch from 0b78678 to cbaf42d Compare February 18, 2025 13:38
@tgross35
Copy link
Contributor

Two nits if this needs future changes but no reason to wait on those now. The behavior looks correct to me.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Feb 18, 2025

📌 Commit cbaf42d has been approved by tgross35

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 18, 2025
@tgross35
Copy link
Contributor

tgross35 commented Feb 18, 2025

Isn't it an LLVM bug that an intrinsic that returns i128 is invoked using a different ABI than what LLVM itself generated when a function returns i128? Like, LLVM doesn't agree with itself on the ABI here...

Yeah, this isn't very nice. So this PR makes our ABI line up with Clang so extern "C" becomes sound, but compiler-builtins will need to add a workaround to return indirectly until LLVM fixes this.

Nevermind, LLVM is still using rax and rdx for __udivti3 which is consistent with how it returns i128. do_div explicitly saves the return value to a pointer and returns void, LLVM isn't adding implicitly returning indirectly from do_div indirectly.

@tgross35
Copy link
Contributor

tgross35 commented Feb 18, 2025

Just for softfloat, we still need to make sure it uses the xmm0 return for hardfloat.

Didn't you say in rust-lang/compiler-builtins#758 that LLVM is now doing this right by itself?

Also to confirm this, LLVM seems to return i128 in rax and rdx unless we use a vector type. So I guess this is what no-sse is falling back to.

More at rust-lang/compiler-builtins#758.

@RalfJung
Copy link
Member Author

@bors r-
Let me fix those nits 'cause why not

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 19, 2025
@RalfJung RalfJung force-pushed the softfloat-means-no-simd branch from cbaf42d to 73b6482 Compare February 19, 2025 07:41
@RalfJung
Copy link
Member Author

@bors r=tgross35

@bors
Copy link
Contributor

bors commented Feb 19, 2025

📌 Commit 73b6482 has been approved by tgross35

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 19, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2025
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#127793 (Added project-specific Zed IDE settings)
 - rust-lang#134995 (Stabilize const_slice_flatten)
 - rust-lang#136301 (Improve instant docs)
 - rust-lang#136347 (Add a bullet point to `std::fs::copy`)
 - rust-lang#136794 (Stabilize file_lock)
 - rust-lang#137094 (x86_win64 ABI: do not use xmm0 with softfloat ABI)
 - rust-lang#137227 (docs(dev): Update the feature-gate instructions)
 - rust-lang#137232 (Don't mention `FromResidual` on bad `?`)
 - rust-lang#137251 (coverage: Get hole spans from nested items without fully visiting them)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit d8debbd into rust-lang:master Feb 19, 2025
6 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Feb 19, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2025
Rollup merge of rust-lang#137094 - RalfJung:softfloat-means-no-simd, r=tgross35

x86_win64 ABI: do not use xmm0 with softfloat ABI

This adjusts rust-lang#134290 to not apply the new logic to targets marked as "softfloat". That fixes most instances of the issue brought up [here](rust-lang#116558 (comment)).

r? `@tgross35`
@RalfJung RalfJung deleted the softfloat-means-no-simd branch February 20, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants