Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Allocation for small union: Union{Nothing,Tuple{String}} #57157

Open
NHDaly opened this issue Jan 24, 2025 · 4 comments
Open

Unexpected Allocation for small union: Union{Nothing,Tuple{String}} #57157

NHDaly opened this issue Jan 24, 2025 · 4 comments
Labels
performance Must go faster

Comments

@NHDaly
Copy link
Member

NHDaly commented Jan 24, 2025

The following code demonstrates a performance issue where a function that returns a Union{Nothing,Tuple{String}} heap-allocates the returned tuple:

julia> versioninfo()
Julia Version 1.12.0-DEV.1571
Commit 671cd5e1db7 (2024-11-07 02:50 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH =

julia> using BenchmarkTools

julia> zot(s) = s=="" ? nothing : (s,)
zot (generic function with 2 methods)

julia> foo(s) = s=="" ? nothing : s
foo (generic function with 2 methods)

julia> @btime zot($("a"))
  6.750 ns (1 allocation: 16 bytes)
("a",)

julia> @btime foo($("a"))
  4.625 ns (0 allocations: 0 bytes)
"a"

julia> @btime zot($(""))
  2.250 ns (0 allocations: 0 bytes)

And this is unique to Tuple{String}. With Tuple{Int}, we don't see it:

julia> bar(x) = x==0 ? nothing : (x,)
bar (generic function with 2 methods)

julia> @btime bar(10)
  1.125 ns (0 allocations: 0 bytes)
(10,)

Also, I see this on 1.12, 1.10, as well as all the way back to 1.0 on the various versions I tested. So not a recent regression.

@NHDaly NHDaly added the performance Must go faster label Jan 24, 2025
@NHDaly
Copy link
Member Author

NHDaly commented Jan 24, 2025

CC: @rgankema, @kuszmaul

@NHDaly
Copy link
Member Author

NHDaly commented Jan 24, 2025

We can see the allocation here:

julia> @code_llvm zot("a")
; Function Signature: zot(String)
;  @ REPL[3]:1 within `zot`
define nonnull ptr @julia_zot_3465(ptr noundef nonnull %"s::String") #0 {
top:
  %pgcstack = call ptr inttoptr (i64 4371775260 to ptr)(i64 4371775296) #9
; ┌ @ strings/string.jl:163 within `==`
   %.not = icmp eq ptr %"s::String", @"jl_global#3467.jit"
   br i1 %.not, label %L3, label %guard_exit

common.ret:                                       ; preds = %L4, %L3
   %common.ret.op = phi ptr [ %jl_nothing, %L3 ], [ %"box::Tuple", %L4 ]
; └
  ret ptr %common.ret.op

L3:                                               ; preds = %guard_exit, %top
  %jl_nothing = load ptr, ptr @jl_nothing, align 8
  br label %common.ret

L4:                                               ; preds = %guard_exit
  %ptls_field = getelementptr inbounds i8, ptr %pgcstack, i64 16
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %"box::Tuple" = call noalias nonnull align 8 dereferenceable(16) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 424, i32 16, i64 5457758224) #8
  %"box::Tuple.tag_addr" = getelementptr inbounds i64, ptr %"box::Tuple", i64 -1
  store atomic i64 5457758224, ptr %"box::Tuple.tag_addr" unordered, align 8
  store atomic ptr %"s::String", ptr %"box::Tuple" unordered, align 8
  br label %common.ret

guard_exit:                                       ; preds = %top
; ┌ @ strings/string.jl:163 within `==`
   %0 = call i32 @jl_egal__unboxed(ptr nonnull %"s::String", ptr nonnull @"jl_global#3467.jit", i64 160)
   %1 = and i32 %0, 1
   %2 = icmp eq i32 %1, 0
; └
  br i1 %2, label %L4, label %L3
}

Though I'm not sure why this isn't eliminated by the small-union union-splitting optimizations.

@gbaraldi
Copy link
Member

I believe this is the same as #53584. Where we don't have an abi for this.

@NHDaly
Copy link
Member Author

NHDaly commented Jan 24, 2025

Ah cool, thanks. Indeed, as vtjnash noted in the other thread:

and probably should (it is the iteration protocol)

This issue was an MRE created from us finding allocations in our iteration protocol 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants