Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction-parallel multithreading #3

Open
siboehm opened this issue Jul 28, 2021 · 0 comments
Open

Instruction-parallel multithreading #3

siboehm opened this issue Jul 28, 2021 · 0 comments
Labels
performance potential speedups. Not for bugs.

Comments

@siboehm
Copy link
Owner

siboehm commented Jul 28, 2021

Right now parallelization is implemented in a straight-forward way in lleaves by partitioning the input-data across threads.
Instead, each thread should predict across the whole input-data, but only across a subset of the trees.

Example (100 Trees in forest, 2 threads):
Thread 1:

for row_id in range(len(input_data)):
  for tree in trees[0:50]:
    result[row_id] += tree(data[row_id]))
global_result += result

Thread 2:

for row_id in range(len(input_data)):
  for tree in trees[50:100]:
    result[row_id] += tree(data[row_id]))
global_result += result

Ideally this would keep each thread's trees fully in L1i-cache, resulting in super-linear speedups with enough cores, instead of the current linear speedups.
Benchmarking is necessary to test how large the overhead from having to use atomic-adds would be.

Caveats:

  • n_threads would need to be specified during compile()?
  • The forest_root function would get a more complicated API, making it harder for users to implement their own runtime.
@siboehm siboehm added the performance potential speedups. Not for bugs. label Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance potential speedups. Not for bugs.
Projects
None yet
Development

No branches or pull requests

1 participant