-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize beam sensor model runtime performance #200
Conversation
1df6f39
to
c56cb0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glpuga Left two super minor comments. It is also worth noting that RelWithDebInfo
reduces the optimization level to -O2
instead of -O3
, so flamegraphs might not be as good at comparing micro-optimizations changes like these.
Microbenchmarks using googlebenchmark
compiled in Release
mode may be better at detecting true progress.
d776da1
to
16083dc
Compare
Signed-off-by: Gerardo Puga <glpuga@ekumenlabs.com>
16083dc
to
4bf3ac6
Compare
Adds an updated report including the following changes from the last: - Includes the changes merged in #195 #199 #200 #207 - Measured using the 1x replay speed to prevent distortions to the CPU results - Fixes typos in configuration files, found during review Signed-off-by: Gerardo Puga <glpuga@ekumenlabs.com>
Proposed changes
Minor optimizations to the tightest execution loops in the beam sensor model.
While these changes do improve performance a bit, they barely make a difference in the performance disadvantage we have against Nav2 AMCL. We are still missing something much bigger than these optimizations.
Type of change
Checklist
Put an
x
in the boxes that apply. This is simply a reminder of what we will require before merging your code.Additional comments
These changes change the performance profile, as seen through
perf
from this:to this:
Note: These flamegraphs assume the changes in #199 have already been merged, since there are grid optimizations that are common to both Likelihood and Beam.
Notice that relative to the unmodified the
beluga::Bresenham2i::Line
block of code (that had no changes done to it, neither in performance per execution nor in total number of executions), the overall time spent in theimportance_weight
function seems to have reduced significantly.Notice also the removal of the second stack call tower on the right, which appears to be related to queuing in the
MessageFilter
.However, these changes barely changed the cpu usage profile:
And the change is barely noticeable in the difference against
Nav2 AMCL
.While it's still possible our implementation of Bresenham is less peforming than Nav2's, even if we somehow reduced the tracing runtime cost to 0 with some magic implementation, that would still get us to perform at basically the same level as
Nav2 amcl
. To me that indicates that the problem is somewhere else.I suspect we are actually doing more work than Nav2, but I haven't been able to find any proof of that.
Further work is still needed.