You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now parallelization is implemented in a straight-forward way in lleaves by partitioning the input-data across threads.
Instead, each thread should predict across the whole input-data, but only across a subset of the trees.
Example (100 Trees in forest, 2 threads):
Thread 1:
for row_id in range(len(input_data)):
for tree in trees[0:50]:
result[row_id] += tree(data[row_id]))
global_result += result
Thread 2:
for row_id in range(len(input_data)):
for tree in trees[50:100]:
result[row_id] += tree(data[row_id]))
global_result += result
Ideally this would keep each thread's trees fully in L1i-cache, resulting in super-linear speedups with enough cores, instead of the current linear speedups.
Benchmarking is necessary to test how large the overhead from having to use atomic-adds would be.
Caveats:
n_threads would need to be specified during compile()?
The forest_root function would get a more complicated API, making it harder for users to implement their own runtime.
The text was updated successfully, but these errors were encountered:
Right now parallelization is implemented in a straight-forward way in lleaves by partitioning the input-data across threads.
Instead, each thread should predict across the whole input-data, but only across a subset of the trees.
Example (100 Trees in forest, 2 threads):
Thread 1:
Thread 2:
Ideally this would keep each thread's trees fully in L1i-cache, resulting in super-linear speedups with enough cores, instead of the current linear speedups.
Benchmarking is necessary to test how large the overhead from having to use atomic-adds would be.
Caveats:
n_threads
would need to be specified during compile()?forest_root
function would get a more complicated API, making it harder for users to implement their own runtime.The text was updated successfully, but these errors were encountered: