Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyProfiler: shorten output into "Other" section #3885

Conversation

AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented Apr 10, 2024

Summary

In this PR, an option is added to shorten the output from TinyProfiler at the end of a simulation.

tiny_profiler.print_threshold = 1.

With the current approach, tiny_profiler.print_threshold specifies the maximum inclusive runtime that the "Other" section can take in percent relative to the total runtime. The default value is 1 (=1%), which results in at least 99% of the total inclusive and exclusive time still being profiled outside "Other".
In the exclusive section, the same functions are combined into "Other" as in the inclusive section. This has the effect that a given function will either show up in both or neither of the sections. But this also means that functions such as "main()" with a large inclusive but short exclusive runtime will still show up in the exclusive section, even though functions with longer exclusive runtime might have been put into "Other".

Additional background

HiPACE++ TinyProfiler output with tiny_profiler.print_threshold = 1.:

TinyProfiler total time across processes [min...avg...max]: 19.95 ... 20.37 ... 20.7

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
hpmg::MultiGrid::solve1()                            1000      6.532      6.542       6.57  31.73%
AnyDST::Execute()                                    6000      3.853      3.869      3.892  18.80%
AdvanceBeamParticlesSlice()                          1000        2.7      2.716      2.736  13.21%
ExplicitDeposition()                                 1000      2.246      2.261      2.277  11.00%
AdvancePlasmaParticles()                             1000      1.249      1.255      1.271   6.14%
DepositCurrent_PlasmaParticleContainer()             1001     0.9996      1.006      1.014   4.90%
MultiBuffer::get_data()                              1000  0.0008554     0.3973     0.9006   4.35%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000     0.4731     0.4752     0.4797   2.32%
Fields::InitializeSlices()                           1000     0.4426     0.4504     0.4597   2.22%
Fields::ShiftSlices()                                1000     0.2872     0.3431     0.4128   1.99%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000     0.3722     0.3751     0.3783   1.83%
Hipace::InitializeSxSyWithBeam()                     1000     0.2212     0.2224     0.2236   1.08%
FillBoundary_nowait()                                4000     0.1168     0.1202      0.124   0.60%
Fields::AddRhoIons()                                 1000    0.09308    0.09395    0.09525   0.46%
MultiBuffer::put_data()                              1000   0.004975    0.05095    0.06177   0.30%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02971    0.03274    0.05097   0.25%
DepositCurrentSlice_BeamParticleContainer()          2000    0.03985    0.04177    0.04291   0.21%
Hipace::SolveOneSlice()                              1000   0.007764   0.008161   0.008495   0.04%
Hipace::ExplicitMGSolveBxBy()                        1000   0.003842   0.003985   0.004245   0.02%
Hipace::Evolve()                                        1  0.0008552   0.001817    0.00225   0.01%
FabArray::FillBoundary()                             4000   0.001446   0.001505   0.001561   0.01%
main()                                                  1   0.001071   0.001176   0.001244   0.01%
Other                                               11832    0.08133     0.1007     0.1464   0.71%
--------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
--------------------------------------------------------------------------------------------------
main()                                                  1      19.95      20.37       20.7 100.00%
Hipace::Evolve()                                        1      19.91      20.33      20.66  99.80%
Hipace::SolveOneSlice()                              1000      19.88      20.31      20.64  99.69%
Hipace::ExplicitMGSolveBxBy()                        1000      6.536      6.546      6.574  31.75%
hpmg::MultiGrid::solve1()                            1000      6.532      6.542       6.57  31.73%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000      4.766      4.784      4.815  23.25%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000      4.326      4.344      4.372  21.11%
AnyDST::Execute()                                    6000      3.853      3.869      3.892  18.80%
AdvanceBeamParticlesSlice()                          1000        2.7      2.716      2.736  13.21%
ExplicitDeposition()                                 1000      2.246      2.261      2.277  11.00%
AdvancePlasmaParticles()                             1000      1.249      1.255      1.271   6.14%
DepositCurrent_PlasmaParticleContainer()             1001     0.9996      1.006      1.014   4.90%
MultiBuffer::get_data()                              1000    0.03689     0.4028     0.9019   4.36%
Fields::InitializeSlices()                           1000     0.4426     0.4504     0.4597   2.22%
Fields::ShiftSlices()                                1000     0.2872     0.3431     0.4128   1.99%
Hipace::InitializeSxSyWithBeam()                     1000     0.2782      0.281     0.2849   1.38%
FabArray::FillBoundary()                             4000     0.1199     0.1233     0.1271   0.61%
FillBoundary_nowait()                                4000     0.1176      0.121     0.1248   0.60%
Fields::AddRhoIons()                                 1000    0.09308    0.09395    0.09525   0.46%
MultiBuffer::put_data()                              1000   0.004975    0.05116    0.06203   0.30%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02971    0.03274    0.05097   0.25%
DepositCurrentSlice_BeamParticleContainer()          2000    0.03985    0.04177    0.04291   0.21%
Other                                               11832     0.1359     0.1483     0.1925   0.93%
--------------------------------------------------------------------------------------------------

HiPACE++ TinyProfiler output with tiny_profiler.print_threshold = 0 (off):

TinyProfiler total time across processes [min...avg...max]: 20.17 ... 20.66 ... 20.98

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
hpmg::MultiGrid::solve1()                            1000      6.531      6.548       6.58  31.37%
AnyDST::Execute()                                    6000      3.855      3.881      3.942  18.79%
AdvanceBeamParticlesSlice()                          1000      2.702      2.718      2.747  13.09%
ExplicitDeposition()                                 1000      2.244      2.262      2.282  10.88%
AdvancePlasmaParticles()                             1000      1.251      1.269      1.371   6.53%
MultiBuffer::get_data()                              1000  0.0008761     0.6194       1.14   5.43%
DepositCurrent_PlasmaParticleContainer()             1001      1.004       1.01       1.02   4.86%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000     0.4723      0.474      0.477   2.27%
Fields::InitializeSlices()                           1000     0.4447     0.4534     0.4741   2.26%
Fields::ShiftSlices()                                1000      0.288     0.3436     0.4096   1.95%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000     0.3727     0.3747     0.3758   1.79%
Hipace::InitializeSxSyWithBeam()                     1000     0.2198     0.2219     0.2233   1.06%
FillBoundary_nowait()                                4000     0.1168     0.1207     0.1259   0.60%
Fields::AddRhoIons()                                 1000    0.09291    0.09412    0.09528   0.45%
MultiBuffer::put_data()                              1000   0.005048    0.05104    0.06124   0.29%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02938    0.03298    0.04933   0.24%
DepositCurrentSlice_BeamParticleContainer()          2000    0.04035    0.04223    0.04532   0.22%
shiftSlippedParticles()                               678    0.03493    0.03694    0.03845   0.18%
BeamParticleContainer::InitBeamFixedWeightSlice()     125          0   0.004304    0.03443   0.16%
Hipace::InitData()                                      1   0.006485    0.02631     0.0304   0.14%
PlasmaParticleContainer::InitParticles()                1    0.02805    0.02865    0.02917   0.14%
BeamParticleContainer::InitBeamFixedWeight3D()          1   9.41e-07   0.002142    0.01713   0.08%
Hipace::SolveOneSlice()                              1000   0.007763    0.00821    0.00885   0.04%
FabArray::setVal()                                      4   0.007318   0.007621   0.008111   0.04%
AnyDST::CreatePlan()                                    1   0.005389   0.006022   0.006552   0.03%
Fields::AllocData()                                     1   0.005303   0.005756    0.00611   0.03%
sortBeamParticlesByBox()                                0          0  0.0005877   0.004702   0.02%
Hipace::ExplicitMGSolveBxBy()                        1000   0.003881   0.003983   0.004134   0.02%
BeamParticleContainer::resize()                      3014   0.002165   0.002377     0.0025   0.01%
Hipace::Evolve()                                        1  0.0008157   0.001788   0.002294   0.01%
FabArray::FillBoundary()                             4000   0.001378   0.001443   0.001546   0.01%
main()                                                  1    0.00106   0.001194   0.001266   0.01%
FillBoundary_finish()                                4000  0.0007684  0.0008594  0.0009376   0.00%
FabArrayBase::getFB()                                4000   0.000709  0.0007543  0.0008294   0.00%
AdaptiveTimeStep::CalculateFromDensity()                1  6.339e-05   7.22e-05  0.0001312   0.00%
FabArrayBase::FB::FB()                                  1  3.079e-05  3.577e-05  3.732e-05   0.00%
AdaptiveTimeStep::CalculateFromMinUz()                  1  2.495e-06   3.89e-06  1.072e-05   0.00%
ParticleContainer::clearParticles()                     1    3.4e-07  4.009e-07   4.81e-07   0.00%
--------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
--------------------------------------------------------------------------------------------------
main()                                                  1      20.17      20.66      20.98 100.00%
Hipace::Evolve()                                        1      20.12      20.61      20.93  99.77%
Hipace::SolveOneSlice()                              1000      20.07      20.56      20.89  99.57%
Hipace::ExplicitMGSolveBxBy()                        1000      6.535      6.552      6.584  31.38%
hpmg::MultiGrid::solve1()                            1000      6.531      6.548       6.58  31.37%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000      4.768      4.794       4.86  23.17%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000      4.327      4.355      4.419  21.06%
AnyDST::Execute()                                    6000      3.855      3.881      3.942  18.79%
AdvanceBeamParticlesSlice()                          1000      2.702      2.718      2.747  13.09%
ExplicitDeposition()                                 1000      2.244      2.262      2.282  10.88%
AdvancePlasmaParticles()                             1000      1.251      1.269      1.371   6.53%
MultiBuffer::get_data()                              1000     0.0366     0.6249      1.141   5.44%
DepositCurrent_PlasmaParticleContainer()             1001      1.004       1.01       1.02   4.86%
Fields::InitializeSlices()                           1000     0.4447     0.4534     0.4741   2.26%
Fields::ShiftSlices()                                1000      0.288     0.3436     0.4096   1.95%
Hipace::InitializeSxSyWithBeam()                     1000     0.2765      0.281     0.2851   1.36%
FabArray::FillBoundary()                             4000     0.1198     0.1238     0.1291   0.62%
FillBoundary_nowait()                                4000     0.1175     0.1215     0.1266   0.60%
Fields::AddRhoIons()                                 1000    0.09291    0.09412    0.09528   0.45%
MultiBuffer::put_data()                              1000   0.005048    0.05126    0.06151   0.29%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02938    0.03298    0.04933   0.24%
Hipace::InitData()                                      1    0.04745    0.04759     0.0477   0.23%
DepositCurrentSlice_BeamParticleContainer()          2000    0.04035    0.04223    0.04532   0.22%
shiftSlippedParticles()                               678    0.03558    0.03747    0.03907   0.19%
BeamParticleContainer::InitBeamFixedWeightSlice()     125          0   0.004465    0.03572   0.17%
PlasmaParticleContainer::InitParticles()                1    0.02805    0.02866    0.02918   0.14%
Fields::AllocData()                                     1    0.01716    0.01854    0.01907   0.09%
BeamParticleContainer::InitBeamFixedWeight3D()          1   9.41e-07   0.002142    0.01713   0.08%
FabArray::setVal()                                      4   0.007318   0.007621   0.008111   0.04%
AnyDST::CreatePlan()                                    1   0.005389   0.006022   0.006552   0.03%
sortBeamParticlesByBox()                                0          0  0.0005877   0.004702   0.02%
BeamParticleContainer::resize()                      3014   0.002165   0.002377     0.0025   0.01%
FillBoundary_finish()                                4000  0.0007684  0.0008594  0.0009376   0.00%
FabArrayBase::getFB()                                4000  0.0007451    0.00079  0.0008653   0.00%
AdaptiveTimeStep::CalculateFromDensity()                1  6.339e-05   7.22e-05  0.0001312   0.00%
FabArrayBase::FB::FB()                                  1  3.079e-05  3.577e-05  3.732e-05   0.00%
AdaptiveTimeStep::CalculateFromMinUz()                  1  2.495e-06   3.89e-06  1.072e-05   0.00%
ParticleContainer::clearParticles()                     1    3.4e-07  4.009e-07   4.81e-07   0.00%
--------------------------------------------------------------------------------------------------

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@WeiqunZhang
Copy link
Member

LGTM. But there are a few clang-tidy warnings that need to be fixed.

@WeiqunZhang WeiqunZhang merged commit 6fc92e3 into AMReX-Codes:development Apr 16, 2024
69 checks passed
@ax3l
Copy link
Member

ax3l commented Apr 16, 2024

@AlexanderSinn This is great!

I was wondering why tiny_profiler.print_threshold = 1 is just a boolean switch and not an actual threshold value (e.g., 0.95 would be: the sum of all earlier values is 95% of all runtime)? Would this be a good update?

@WeiqunZhang
Copy link
Member

It's not bool, it's double.

@ax3l
Copy link
Member

ax3l commented Apr 16, 2024

Ah, makes sense. Read the PR description details now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants