Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad scaling when 2 sets of particles are used. #108

Open
lmiq opened this issue Dec 6, 2024 · 1 comment · May be fixed by #110
Open

Bad scaling when 2 sets of particles are used. #108

lmiq opened this issue Dec 6, 2024 · 1 comment · May be fixed by #110

Comments

@lmiq
Copy link
Member

lmiq commented Dec 6, 2024

See: trixi-framework/PointNeighbors.jl#8 (comment)

@lmiq
Copy link
Member Author

lmiq commented Dec 9, 2024

The issue, as explained in trixi-framework/PointNeighbors.jl#8 (comment) is that when using 2 sets of particles we are not using the projection-based approach to avoid computing distances for particles that are not within the cutoff.

As a very early test, here is an implementation in which we just duplicate the particles and construct the cell lists for all duplicated set, and use the single-set approach. This is far from ideal, since we are still looping over much more particles than needed, but nevertheless it scales much better already:

julia> run(;n=10^5)
CellListMap (t1): 0.097850265 s
CellListMap different (t2/t1): 1.755102543667102
CellListMap single_list (t5/t1): 2.697372194137645
Test Passed

julia> run(;n=5*10^5)
CellListMap (t1): 0.45006516 s
CellListMap different (t2/t1): 2.006997344562285
CellListMap single_list (t5/t1): 2.740345904579683
Test Passed

julia> run(;n=10^6)
CellListMap (t1): 0.90805197 s
CellListMap different (t2/t1): 2.940807859268231
CellListMap single_list (t5/t1): 2.605681498604094
Test Passed

julia> run(;n=5*10^6)
CellListMap (t1): 4.56837243 s
CellListMap different (t2/t1): 5.094499038906073
CellListMap single_list (t5/t1): 2.7785865216772616
Test Passed

Note that here we are using a triclinic cell, thus ideally we should expect that all runs were comparable to t1 (no 2x overhead implied - we are not using symmetry). These benchmarks to do not include initialization of the ParticleSystems, where the overhead of creating cell lists for a greater number of particles would appear. These are the initialization times for different sizes of systems:

julia> x, box = CellListMap.xatomic(10^5);

julia> @b ParticleSystem(positions=x, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
16.301 ms (47290 allocs: 92.691 MiB)

julia> x, box = CellListMap.xatomic(5*10^5);

julia> @b ParticleSystem(positions=x, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
107.122 ms (139089 allocs: 403.936 MiB, 28.82% gc time)

julia> x, box = CellListMap.xatomic(10^6);

julia> @b ParticleSystem(positions=x, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
209.146 ms (237622 allocs: 789.580 MiB, 18.13% gc time, without a warmup)

julia> x, box = CellListMap.xatomic(5*10^6);

julia> @b ParticleSystem(positions=x, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
1.685 s (1036615 allocs: 3.685 GiB, 31.67% gc time, without a warmup)

Which, we see, are not negligible. The fraction of time spent on initialization relative to the time spent on mapping, in these examples, are, thus:

10^5       :    0.16
5*10^5   :    0.23
10^6       :    0.23
5*10^6  :     0.36

Remembering that this initialization time is largely dependent on the size of the smallest set, reason why it is avoided now:

julia> x, box = CellListMap.xatomic(5*10^6);

julia> xypositions = vcat(x,y);

julia> @b ParticleSystem(positions=xypositions, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
2.381 s (1322650 allocs: 4.989 GiB, 35.57% gc time, without a warmup)

julia> y = x[1:10^4];

julia> @b ParticleSystem(xpositions=x, ypositions=y, cutoff=box.cutoff, output=similar(x), unitcell=box.input_unit_cell.matrix)
981.499 ms (46537 allocs: 2.475 GiB, 29.04% gc time, without a warmup)

Code:

```julia using CellListMap, PointNeighbors, StaticArrays using Chairmarks using Test

function test_celllistmap(_, sys)
sys.output .= 0
map_pairwise!(sys) do x, y, i, j, d2, f
d = sqrt(d2)
fij = (y - x) / d^3
for dim in axes(f, 1)
@inbounds f[dim, i] += fij[dim]
@inbounds f[dim, j] -= fij[dim]
end

    return f
end
return sys.output

end

function test_celllistmap_single_list(_, sys)
sys.output .= 0
map_pairwise!(sys) do x, y, i, j, d2, f
d2 < eps() && return f
n = size(f,2)
if (i <= n && j > n) ||
(j <= n && i > n)
d = sqrt(d2)
if i < j
fij = (y - x) / d^3
else
fij = (x - y) / d^3
end
for dim in axes(f, 1)
f[dim, min(i,j)] += fij[dim]
end
end
return f
end
return sys.output
end

function test_celllistmap_different(_, sys)
sys.output .= 0
map_pairwise!(sys) do x, y, i, j, d2, f
d2 < eps() && return f
d = sqrt(d2)
fij = (y - x) / d^3
for dim in axes(f, 1)
@inbounds f[dim, i] += fij[dim]
end
return f
end
return sys.output
end

function test_pointneighbors(x, f, nhs)
f .= 0
foreach_point_neighbor(x, x, nhs, parallel=true) do i, j, pos_diff, distance
distance < sqrt(eps()) && return
dv = -pos_diff / distance^3
for dim in axes(f, 1)
@inbounds f[dim, i] += dv[dim]
end
end
return f
end

function run(;n=10^5)
x, box = CellListMap.xatomic(n);
uc = box.input_unit_cell.matrix

xmat = Matrix(reinterpret(reshape, Float64, x));
f = similar(xmat)
f2 = similar(xmat)
f3 = similar(xmat)
f4 = similar(xmat)
f5 = similar(xmat)

xymat = hcat(xmat, xmat)

nhs1 = GridNeighborhoodSearch{3}(; search_radius=box.cutoff, n_points=size(xmat,2))
initialize!(nhs1, xmat, xmat)

min_corner = minimum(xmat, dims=2) .- 5
max_corner = maximum(xmat, dims=2) .+ 5
nhs2 = GridNeighborhoodSearch{3}(; search_radius=box.cutoff, n_points=size(xmat,2),
                                cell_list=FullGridCellList(search_radius=box.cutoff;
                                                            min_corner, max_corner,
                                                            max_points_per_cell=2000))
initialize!(nhs2, xmat, xmat)
sys = ParticleSystem(positions=x, cutoff=box.cutoff, output=f, unitcell=uc)
sys2 = ParticleSystem(xpositions=x, ypositions=x, cutoff=box.cutoff, output=f2, unitcell=uc)
sys3 = ParticleSystem(xpositions=xymat, cutoff=box.cutoff, output=f5, unitcell=uc)

b1 = @b test_celllistmap($(nothing), $sys)
println("CellListMap (t1): ", b1.time, " s")
b2 = @b test_celllistmap_different($(nothing), $sys2)
println("CellListMap different (t2/t1): ", b2.time / b1.time) 
b5 = @b test_celllistmap_single_list($(nothing), $sys3)
println("CellListMap single_list (t5/t1): ", b5.time / b1.time) 

b3 = @b test_pointneighbors($xmat, $f3, $nhs1)

println("PointNeighbors nhs1 (t3/t1): ", b3.time / b1.time)

b4 = @b test_pointneighbors($xmat, $f4, $nhs2)

println("PointNeighbors nhs2 (t4/t1): ", b4.time / b1.time)

@test f ≈ f2 ≈ f3 ≈ f4

@test f ≈ f2 ≈ f5

end


@lmiq lmiq linked a pull request Dec 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant