You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing timings for the two-particle accumulation in cthyb where the computational bottleneck is the product of two M-matrices in frequency space into a contribution to the two-particle Green's function.
In clef syntax I want to do
G2(w,n1,n2)(i,j,k,l) << G2(w,n1,n2)(i j,k,l) - s * M_il(n1,n2)(i,l) * M_kj(n2+w,n1+w)(k,j);
However this is 10x slower than the direct naive 7-level for loop
for (constauto &w : b_mesh)
for (constauto &n1 : f_mesh)
for (constauto &n2 : f_mesh)
for (constauto i : range(G2.target_shape()[0]))
for (constauto j : range(G2.target_shape()[1]))
for (constauto k : range(G2.target_shape()[2]))
for (constauto l : range(G2.target_shape()[3]))
G2[w, n1, n2](i, j, k, l) -= s * M_il[n1, n2](i, l) * M_kj[n2 + w, n1 + w](k, j);
Using the c++17 style product range for loops looks nicer
however, it comes with a factor of 2 runtime overhead, where 75% of the time is spent in the first loop over the mesh. This is rather far from zero cost abstraction...
Can something be done to get us closer to "zero cost" ?
Best, Hugo
The text was updated successfully, but these errors were encountered:
Dear all,
I am doing timings for the two-particle accumulation in
cthyb
where the computational bottleneck is the product of two M-matrices in frequency space into a contribution to the two-particle Green's function.In clef syntax I want to do
G2(w,n1,n2)(i,j,k,l) << G2(w,n1,n2)(i j,k,l) - s * M_il(n1,n2)(i,l) * M_kj(n2+w,n1+w)(k,j);
However this is 10x slower than the direct naive 7-level for loop
Using the c++17 style product range for loops looks nicer
however, it comes with a factor of 2 runtime overhead, where 75% of the time is spent in the first loop over the mesh. This is rather far from zero cost abstraction...
Can something be done to get us closer to "zero cost" ?
Best, Hugo
The text was updated successfully, but these errors were encountered: