Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the semantics of copyto! and add pagelock! #555

Merged
merged 5 commits into from
Jan 21, 2025
Merged

Conversation

vchuravy
Copy link
Member

No description provided.

src/KernelAbstractions.jl Outdated Show resolved Hide resolved
src/KernelAbstractions.jl Outdated Show resolved Hide resolved
src/KernelAbstractions.jl Outdated Show resolved Hide resolved
@luraess
Copy link
Contributor

luraess commented Jan 14, 2025

All tests currently fail because in line 139 we use ::Backend which is only available after this point

abstract type Backend end

There were also changes and type-setting suggestions ☝️

vchuravy and others added 3 commits January 21, 2025 12:08
Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Ludovic Räss <61313342+luraess@users.noreply.github.com>
Copy link
Contributor

Benchmark Results

main aa5ca4f... main/aa5ca4f1794a02...
saxpy/default/Float16/1024 0.721 ± 0.0088 μs 0.73 ± 0.0088 μs 0.988
saxpy/default/Float16/1048576 0.173 ± 0.0086 ms 0.173 ± 0.011 ms 0.998
saxpy/default/Float16/16384 3.33 ± 0.049 μs 3.32 ± 0.019 μs 1
saxpy/default/Float16/2048 0.899 ± 0.013 μs 0.911 ± 0.012 μs 0.987
saxpy/default/Float16/256 0.574 ± 0.007 μs 0.582 ± 0.0078 μs 0.986
saxpy/default/Float16/262144 0.0441 ± 0.00052 ms 0.0441 ± 0.00044 ms 1
saxpy/default/Float16/32768 6 ± 0.072 μs 6 ± 0.042 μs 1
saxpy/default/Float16/4096 1.3 ± 0.025 μs 1.31 ± 0.023 μs 0.989
saxpy/default/Float16/512 0.635 ± 0.0094 μs 0.644 ± 0.0083 μs 0.987
saxpy/default/Float16/64 0.542 ± 0.0057 μs 0.552 ± 0.0059 μs 0.983
saxpy/default/Float16/65536 11.7 ± 0.13 μs 11.7 ± 0.11 μs 0.997
saxpy/default/Float32/1024 0.637 ± 0.0094 μs 0.63 ± 0.012 μs 1.01
saxpy/default/Float32/1048576 0.239 ± 0.02 ms 0.24 ± 0.02 ms 0.995
saxpy/default/Float32/16384 2.96 ± 0.5 μs 2.75 ± 0.13 μs 1.08
saxpy/default/Float32/2048 0.755 ± 0.063 μs 0.767 ± 0.066 μs 0.985
saxpy/default/Float32/256 0.568 ± 0.0077 μs 0.556 ± 0.0091 μs 1.02
saxpy/default/Float32/262144 0.0577 ± 0.003 ms 0.058 ± 0.0022 ms 0.994
saxpy/default/Float32/32768 5.53 ± 0.79 μs 5.3 ± 0.25 μs 1.04
saxpy/default/Float32/4096 1.17 ± 0.13 μs 1.13 ± 0.07 μs 1.03
saxpy/default/Float32/512 0.606 ± 0.0089 μs 0.594 ± 0.0085 μs 1.02
saxpy/default/Float32/64 0.556 ± 0.0074 μs 0.545 ± 0.0072 μs 1.02
saxpy/default/Float32/65536 12.6 ± 0.73 μs 12.5 ± 0.72 μs 1
saxpy/default/Float64/1024 0.752 ± 0.063 μs 0.764 ± 0.065 μs 0.984
saxpy/default/Float64/1048576 0.518 ± 0.056 ms 0.513 ± 0.03 ms 1.01
saxpy/default/Float64/16384 5.38 ± 0.61 μs 5.27 ± 0.25 μs 1.02
saxpy/default/Float64/2048 1.16 ± 0.12 μs 1.14 ± 0.074 μs 1.02
saxpy/default/Float64/256 0.578 ± 0.011 μs 0.576 ± 0.0085 μs 1
saxpy/default/Float64/262144 0.114 ± 0.0075 ms 0.115 ± 0.006 ms 0.997
saxpy/default/Float64/32768 12.6 ± 0.76 μs 12.6 ± 0.6 μs 1
saxpy/default/Float64/4096 1.75 ± 0.21 μs 1.67 ± 0.084 μs 1.05
saxpy/default/Float64/512 0.63 ± 0.013 μs 0.632 ± 0.013 μs 0.996
saxpy/default/Float64/64 0.557 ± 0.009 μs 0.549 ± 0.0078 μs 1.01
saxpy/default/Float64/65536 28.6 ± 1.6 μs 28.8 ± 1.1 μs 0.99
saxpy/static workgroup=(1024,)/Float16/1024 2.17 ± 0.028 μs 2.16 ± 0.032 μs 1.01
saxpy/static workgroup=(1024,)/Float16/1048576 0.159 ± 0.009 ms 0.162 ± 0.0098 ms 0.983
saxpy/static workgroup=(1024,)/Float16/16384 4.41 ± 0.092 μs 4.39 ± 0.069 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.35 ± 0.059 μs 2.32 ± 0.032 μs 1.01
saxpy/static workgroup=(1024,)/Float16/256 2.8 ± 0.035 μs 2.8 ± 0.039 μs 0.998
saxpy/static workgroup=(1024,)/Float16/262144 0.0424 ± 0.0012 ms 0.0422 ± 0.0011 ms 1
saxpy/static workgroup=(1024,)/Float16/32768 6.86 ± 0.19 μs 6.84 ± 0.18 μs 1
saxpy/static workgroup=(1024,)/Float16/4096 2.66 ± 0.038 μs 2.66 ± 0.043 μs 1
saxpy/static workgroup=(1024,)/Float16/512 3.24 ± 0.037 μs 3.26 ± 0.048 μs 0.995
saxpy/static workgroup=(1024,)/Float16/64 2.49 ± 0.22 μs 2.51 ± 0.21 μs 0.993
saxpy/static workgroup=(1024,)/Float16/65536 12.7 ± 0.44 μs 12.6 ± 0.26 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1024 2.22 ± 0.03 μs 2.22 ± 0.039 μs 0.997
saxpy/static workgroup=(1024,)/Float32/1048576 0.242 ± 0.017 ms 0.25 ± 0.019 ms 0.968
saxpy/static workgroup=(1024,)/Float32/16384 4.39 ± 0.41 μs 4.4 ± 0.3 μs 0.998
saxpy/static workgroup=(1024,)/Float32/2048 2.38 ± 0.055 μs 2.4 ± 0.057 μs 0.992
saxpy/static workgroup=(1024,)/Float32/256 2.69 ± 0.059 μs 2.66 ± 0.046 μs 1.01
saxpy/static workgroup=(1024,)/Float32/262144 0.0604 ± 0.0029 ms 0.0609 ± 0.0026 ms 0.992
saxpy/static workgroup=(1024,)/Float32/32768 7.41 ± 0.61 μs 7.67 ± 0.41 μs 0.967
saxpy/static workgroup=(1024,)/Float32/4096 2.67 ± 0.089 μs 2.68 ± 0.063 μs 0.996
saxpy/static workgroup=(1024,)/Float32/512 2.72 ± 0.11 μs 2.7 ± 0.092 μs 1.01
saxpy/static workgroup=(1024,)/Float32/64 2.7 ± 4.3 μs 2.7 ± 5.5 μs 1
saxpy/static workgroup=(1024,)/Float32/65536 15.6 ± 1.1 μs 15.6 ± 0.94 μs 0.997
saxpy/static workgroup=(1024,)/Float64/1024 2.29 ± 0.063 μs 2.31 ± 0.056 μs 0.991
saxpy/static workgroup=(1024,)/Float64/1048576 0.548 ± 0.04 ms 0.552 ± 0.043 ms 0.994
saxpy/static workgroup=(1024,)/Float64/16384 7.24 ± 0.52 μs 7.54 ± 0.37 μs 0.96
saxpy/static workgroup=(1024,)/Float64/2048 2.58 ± 0.087 μs 2.61 ± 0.075 μs 0.989
saxpy/static workgroup=(1024,)/Float64/256 2.66 ± 0.089 μs 2.69 ± 0.092 μs 0.989
saxpy/static workgroup=(1024,)/Float64/262144 0.117 ± 0.01 ms 0.119 ± 0.0072 ms 0.986
saxpy/static workgroup=(1024,)/Float64/32768 15.5 ± 1 μs 15.6 ± 0.93 μs 0.994
saxpy/static workgroup=(1024,)/Float64/4096 3.13 ± 0.22 μs 3.13 ± 0.12 μs 0.998
saxpy/static workgroup=(1024,)/Float64/512 2.63 ± 0.058 μs 2.66 ± 0.081 μs 0.986
saxpy/static workgroup=(1024,)/Float64/64 2.59 ± 0.074 μs 2.61 ± 0.081 μs 0.993
saxpy/static workgroup=(1024,)/Float64/65536 31.1 ± 1.9 μs 31.3 ± 1.4 μs 0.995
time_to_load 0.327 ± 0.0038 s 0.329 ± 0.0043 s 0.994

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy merged commit d300747 into main Jan 21, 2025
33 of 36 checks passed
@vchuravy vchuravy deleted the vc/async_copy branch January 21, 2025 12:35
vchuravy added a commit that referenced this pull request Jan 28, 2025
Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Ludovic Räss <61313342+luraess@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants