You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to convert all the variables in GratingGSM for GPU operation to accelerate the algorythm by A=gpuArray(single(A)), but unfortunatelly,it seems not work very well, in fact the simulation time increase about 10 times compared with those on CPU.
In principle, the algorythm involves large amount of matrix mulplication, and GPU operation should have effect, and I'm not sure what the problem is, could you please share some of your experience? Thank you so much.
Best regards
Xianshun
The text was updated successfully, but these errors were encountered:
I think the direct transfer to GPU using high-level matlab or python functions will not help much, since the computations are sensitive to memory transfers and code organization. For GPU calculations I wrote a separate C++ CUDA code.
And btw, 10 times for double-precision simulations may be a good result depending on the model of your GPU. Please, look carefully at graphic card specifications and note a huge performance difference for different data types.
Thank you so much for your instruction. Well, advanced technique like GPU operation is really complicated for primary users like me unfamiliar with C++ CUDA code. But I also tried matlab code with single precision (no GPU involved), the algorithm can be speeded up a little with little impact on the diffraction efficiencies.
Hi Professor Shcherbakov,
I tried to convert all the variables in GratingGSM for GPU operation to accelerate the algorythm by A=gpuArray(single(A)), but unfortunatelly,it seems not work very well, in fact the simulation time increase about 10 times compared with those on CPU.
In principle, the algorythm involves large amount of matrix mulplication, and GPU operation should have effect, and I'm not sure what the problem is, could you please share some of your experience? Thank you so much.
Best regards
Xianshun
The text was updated successfully, but these errors were encountered: