Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm.maximum Intrinsics的实现 #85

Open
12101111 opened this issue Feb 23, 2025 · 1 comment
Open

llvm.maximum Intrinsics的实现 #85

12101111 opened this issue Feb 23, 2025 · 1 comment

Comments

@12101111
Copy link

IEEE float 2019有一个函数叫maximum,LLVM里没有LoongArch的原生的实现,但是有一份通用的实现,和一份riscv的实现,那么哪一种实现更好呢?

https://llvm.org/docs/LangRef.html#llvm-min-intrinsics-comparation

llvm/llvm-project#64208

maximum:If one of the arguments is NaN, then NaN is returned. Otherwise this returns the greater of the two numbers. -0.0 is considered to be less than +0.0.

LLVM通用代码

Source: llvm/llvm-project@4a8f2f2

目前LLVM Intrinsics 在 LoongArch会生成如下指令

.LCPI0_0:
        .word   0x7fc00000                      # float NaN
func:                                   # @func
        pcalau12i       $a0, %pc_hi20(.LCPI0_0)
        fld.s   $fa2, $a0, %pc_lo12(.LCPI0_0)
        fcmp.cun.s      $fcc0, $fa0, $fa1
        fmax.s  $fa0, $fa0, $fa1
        fsel    $fa0, $fa0, $fa2, $fcc0
        ret

伪代码

const F32_NAN = 0x7fc00000;
let has_nan =  fa0.is_nan() || fa1.is_nan();
let fmax = max(fa0, fa1);
return has_nan? F32_NAN: fmax;

这个方法有两个缺陷

  • 需要rodata存储NaN常量,需要读dcache/内存
  • 返回的Nan不是输入的Nan,导致Payload丢失(虽然该Intrinsics丢失Payload是预期的行为)

但在这个Fallback提交之前,RISC-V有一个无需rodata的实现

RISC-V代码

Source: llvm/llvm-project@4942978

RISC-V的指令
func:                                   # @func
        feq.s   a0, fa0, fa0
        fmv.s   fa5, fa1
        beqz    a0, .LBB0_3
        feq.s   a0, fa1, fa1
        beqz    a0, .LBB0_4
.LBB0_2:                                # %start
        fmax.s  fa0, fa0, fa5
        ret
.LBB0_3:                                # %start
        fmv.s   fa5, fa0
        feq.s   a0, fa1, fa1
        bnez    a0, .LBB0_2
.LBB0_4:                                # %start
        fmax.s  fa0, fa1, fa5
        ret

LLVM IR表示

define float @func(float %x, float %y) {
start:
  %x_is_nan = fcmp uno float %x, %x
  %new_y = select i1 %x_is_nan, float %x, float %y
  %y_is_nan = fcmp uno float %y, %y
  %new_x = select i1 %y_is_nan, float %y, float %x
  %result = call float @llvm.maxnum.f32(float %new_x, float %new_y)
  ret float %result
}

declare float @llvm.maxnum.f32(float, float)

LoongArch指令

func:                                   # @func
	fcmp.cun.s	$fcc0, $fa0, $fa0
	fsel	$fa2, $fa1, $fa0, $fcc0
	fcmp.cun.s	$fcc0, $fa1, $fa1
	fsel	$fa0, $fa0, $fa1, $fcc0
	fmax.s	$fa1, $fa2, $fa2
	fmax.s	$fa0, $fa0, $fa0
	fmax.s	$fa0, $fa0, $fa1
	ret

为什么这里调用了三次fmax? 如果只需要调用一次,那么这个实现更短。即使要调用三次,这个实现也不需要rodata。

@xry111
Copy link
Member

xry111 commented Feb 23, 2025

因为只调用一次的话如果输入中有一个 signaling nan 就会得到 nan,LLVM maxnum intrinsic 的语义不允许这样做。(C 的 fmax 的语义是允许的,但是 LLVM intrinsic 不是 C)。

当然对于 maximum 这个 intrinsic 来说输入中有一个 signaling nan 就得到 nan 是完全没问题的,这样就说明把它展开成 maxnum 并不是最好的做法。

即使调用三次,如果严格去抠指令手册的字眼也是不行的 (因为如果只是按手册来的话输入 -0.0 和 0.0 可能得到 -0.0,这对于 LLVM 和 C 都是不允许的),但目前的实现已经假设指令手册没写的行为了,所以只能考虑修订手册:#86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants