Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a CPU backend using POCL #556

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Implement a CPU backend using POCL #556

wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jan 15, 2025

TODO:

Copy link
Contributor

github-actions bot commented Jan 15, 2025

Benchmark Results

main 77f1ee0... main/77f1ee03942cfc...
saxpy/default/Float16/1024 0.735 ± 0.0066 μs 0.0565 ± 0.026 ms 0.013
saxpy/default/Float16/1048576 0.177 ± 0.0072 ms 0.888 ± 0.023 ms 0.199
saxpy/default/Float16/16384 3.33 ± 0.031 μs 0.0636 ± 0.028 ms 0.0524
saxpy/default/Float16/2048 0.909 ± 0.01 μs 0.0497 ± 0.023 ms 0.0183
saxpy/default/Float16/256 0.592 ± 0.005 μs 0.0594 ± 0.026 ms 0.00997
saxpy/default/Float16/262144 0.0443 ± 0.00048 ms 0.271 ± 0.025 ms 0.164
saxpy/default/Float16/32768 6.02 ± 0.084 μs 0.0755 ± 0.028 ms 0.0797
saxpy/default/Float16/4096 1.31 ± 0.026 μs 0.0664 ± 0.026 ms 0.0197
saxpy/default/Float16/512 0.651 ± 0.0061 μs 0.0539 ± 0.026 ms 0.0121
saxpy/default/Float16/64 0.563 ± 0.0049 μs 0.0544 ± 0.028 ms 0.0104
saxpy/default/Float16/65536 11.6 ± 0.14 μs 0.109 ± 0.027 ms 0.107
saxpy/default/Float32/1024 0.655 ± 0.01 μs 0.0595 ± 0.026 ms 0.011
saxpy/default/Float32/1048576 0.231 ± 0.022 ms 0.473 ± 0.025 ms 0.489
saxpy/default/Float32/16384 2.79 ± 0.14 μs 0.0549 ± 0.025 ms 0.0508
saxpy/default/Float32/2048 0.776 ± 0.062 μs 0.0496 ± 0.023 ms 0.0156
saxpy/default/Float32/256 0.581 ± 0.0075 μs 0.0583 ± 0.027 ms 0.00997
saxpy/default/Float32/262144 0.0553 ± 0.0034 ms 0.163 ± 0.035 ms 0.34
saxpy/default/Float32/32768 5.36 ± 0.22 μs 0.0602 ± 0.027 ms 0.089
saxpy/default/Float32/4096 1.16 ± 0.1 μs 0.0605 ± 0.024 ms 0.0191
saxpy/default/Float32/512 0.618 ± 0.0093 μs 0.0583 ± 0.026 ms 0.0106
saxpy/default/Float32/64 0.571 ± 0.0069 μs 0.0583 ± 0.027 ms 0.00978
saxpy/default/Float32/65536 12.4 ± 0.67 μs 0.075 ± 0.029 ms 0.165
saxpy/default/Float64/1024 0.758 ± 0.035 μs 0.0592 ± 0.026 ms 0.0128
saxpy/default/Float64/1048576 0.505 ± 0.046 ms 0.511 ± 0.061 ms 0.989
saxpy/default/Float64/16384 5.34 ± 0.52 μs 0.0564 ± 0.026 ms 0.0945
saxpy/default/Float64/2048 1.14 ± 0.077 μs 0.0514 ± 0.024 ms 0.0222
saxpy/default/Float64/256 0.588 ± 0.0062 μs 0.055 ± 0.027 ms 0.0107
saxpy/default/Float64/262144 0.114 ± 0.012 ms 0.172 ± 0.03 ms 0.663
saxpy/default/Float64/32768 12.7 ± 1.1 μs 0.0637 ± 0.026 ms 0.199
saxpy/default/Float64/4096 1.7 ± 0.16 μs 0.0602 ± 0.024 ms 0.0282
saxpy/default/Float64/512 0.639 ± 0.01 μs 0.0534 ± 0.027 ms 0.012
saxpy/default/Float64/64 0.567 ± 0.0055 μs 0.0612 ± 0.027 ms 0.00925
saxpy/default/Float64/65536 28.1 ± 2.5 μs 0.0842 ± 0.026 ms 0.334
saxpy/static workgroup=(1024,)/Float16/1024 2.22 ± 0.025 μs 0.0482 ± 0.026 ms 0.0461
saxpy/static workgroup=(1024,)/Float16/1048576 0.157 ± 0.0036 ms 0.898 ± 0.026 ms 0.175
saxpy/static workgroup=(1024,)/Float16/16384 4.48 ± 0.089 μs 0.0609 ± 0.026 ms 0.0735
saxpy/static workgroup=(1024,)/Float16/2048 2.4 ± 0.026 μs 0.0571 ± 0.023 ms 0.042
saxpy/static workgroup=(1024,)/Float16/256 2.81 ± 0.031 μs 0.0554 ± 0.026 ms 0.0507
saxpy/static workgroup=(1024,)/Float16/262144 0.0419 ± 0.0011 ms 0.271 ± 0.027 ms 0.155
saxpy/static workgroup=(1024,)/Float16/32768 6.91 ± 0.17 μs 0.0732 ± 0.025 ms 0.0944
saxpy/static workgroup=(1024,)/Float16/4096 2.72 ± 0.035 μs 0.0613 ± 0.026 ms 0.0444
saxpy/static workgroup=(1024,)/Float16/512 3.26 ± 0.033 μs 0.0511 ± 0.026 ms 0.0638
saxpy/static workgroup=(1024,)/Float16/64 2.52 ± 0.22 μs 0.0524 ± 0.026 ms 0.0481
saxpy/static workgroup=(1024,)/Float16/65536 12.6 ± 0.32 μs 0.104 ± 0.026 ms 0.121
saxpy/static workgroup=(1024,)/Float32/1024 2.24 ± 0.036 μs 0.0424 ± 0.026 ms 0.0529
saxpy/static workgroup=(1024,)/Float32/1048576 0.233 ± 0.022 ms 0.459 ± 0.027 ms 0.507
saxpy/static workgroup=(1024,)/Float32/16384 4.48 ± 0.46 μs 0.0523 ± 0.024 ms 0.0857
saxpy/static workgroup=(1024,)/Float32/2048 2.4 ± 0.06 μs 0.0448 ± 0.023 ms 0.0534
saxpy/static workgroup=(1024,)/Float32/256 2.72 ± 0.051 μs 0.0468 ± 0.026 ms 0.058
saxpy/static workgroup=(1024,)/Float32/262144 0.0483 ± 0.0047 ms 0.158 ± 0.034 ms 0.306
saxpy/static workgroup=(1024,)/Float32/32768 7.64 ± 0.52 μs 0.0581 ± 0.026 ms 0.131
saxpy/static workgroup=(1024,)/Float32/4096 2.68 ± 0.076 μs 0.0521 ± 0.025 ms 0.0513
saxpy/static workgroup=(1024,)/Float32/512 2.74 ± 0.033 μs 0.0542 ± 0.026 ms 0.0504
saxpy/static workgroup=(1024,)/Float32/64 2.75 ± 5.5 μs 0.0542 ± 0.026 ms 0.0507
saxpy/static workgroup=(1024,)/Float32/65536 14.5 ± 1.3 μs 0.0729 ± 0.028 ms 0.199
saxpy/static workgroup=(1024,)/Float64/1024 2.33 ± 0.068 μs 0.0557 ± 0.026 ms 0.0419
saxpy/static workgroup=(1024,)/Float64/1048576 0.499 ± 0.049 ms 0.501 ± 0.051 ms 0.997
saxpy/static workgroup=(1024,)/Float64/16384 7.42 ± 0.4 μs 0.054 ± 0.025 ms 0.137
saxpy/static workgroup=(1024,)/Float64/2048 2.6 ± 0.065 μs 0.0484 ± 0.024 ms 0.0536
saxpy/static workgroup=(1024,)/Float64/256 2.64 ± 0.051 μs 0.0529 ± 0.025 ms 0.0499
saxpy/static workgroup=(1024,)/Float64/262144 0.0924 ± 0.008 ms 0.17 ± 0.031 ms 0.544
saxpy/static workgroup=(1024,)/Float64/32768 14.5 ± 1.5 μs 0.0616 ± 0.025 ms 0.235
saxpy/static workgroup=(1024,)/Float64/4096 3.16 ± 0.24 μs 0.0549 ± 0.025 ms 0.0575
saxpy/static workgroup=(1024,)/Float64/512 2.66 ± 0.074 μs 0.0567 ± 0.026 ms 0.047
saxpy/static workgroup=(1024,)/Float64/64 2.6 ± 0.06 μs 0.0559 ± 0.026 ms 0.0465
saxpy/static workgroup=(1024,)/Float64/65536 25.6 ± 2 μs 0.0808 ± 0.027 ms 0.317
time_to_load 0.316 ± 0.0096 s 1.09 ± 0.0058 s 0.289

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 0% with 871 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (b435bb2) to head (77f1ee0).

Files with missing lines Patch % Lines
src/pocl/nanoOpenCL.jl 0.00% 520 Missing ⚠️
src/pocl/device/array.jl 0.00% 101 Missing ⚠️
src/pocl/backend.jl 0.00% 93 Missing ⚠️
src/pocl/compiler/execution.jl 0.00% 43 Missing ⚠️
src/pocl/compiler/compilation.jl 0.00% 32 Missing ⚠️
src/pocl/device/quirks.jl 0.00% 24 Missing ⚠️
src/pocl/compiler/reflection.jl 0.00% 23 Missing ⚠️
src/pocl/pocl.jl 0.00% 20 Missing ⚠️
src/pocl/device/runtime.jl 0.00% 6 Missing ⚠️
src/nditeration.jl 0.00% 5 Missing ⚠️
... and 2 more
Additional details and impacted files
@@          Coverage Diff           @@
##            main    #556    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         12      21     +9     
  Lines        777    1503   +726     
======================================
- Misses       777    1503   +726     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vchuravy vchuravy changed the base branch from main to vc/barriers February 4, 2025 15:18
@vchuravy vchuravy changed the base branch from vc/barriers to main February 5, 2025 12:29
Copy link
Member Author

vchuravy commented Feb 5, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy mentioned this pull request Feb 5, 2025
@vchuravy vchuravy marked this pull request as ready for review February 5, 2025 12:43
src/pocl/backend.jl Outdated Show resolved Hide resolved
src/pocl/backend.jl Outdated Show resolved Hide resolved
src/pocl/backend.jl Outdated Show resolved Hide resolved
src/pocl/backend.jl Outdated Show resolved Hide resolved
src/pocl/compiler/compilation.jl Outdated Show resolved Hide resolved
src/pocl/pocl.jl Outdated Show resolved Hide resolved
src/pocl/pocl.jl Outdated Show resolved Hide resolved
src/pocl/pocl.jl Outdated Show resolved Hide resolved
src/pocl/pocl.jl Outdated Show resolved Hide resolved
src/pocl/pocl.jl Outdated Show resolved Hide resolved
@vchuravy vchuravy force-pushed the vc/pocl branch 8 times, most recently from 6fb1cea to 8098378 Compare February 6, 2025 14:48
src/pocl/nanoOpenCL.jl Outdated Show resolved Hide resolved
src/pocl/nanoOpenCL.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant