Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Transform Propagation #17840

Merged
merged 39 commits into from
Feb 23, 2025
Merged

Conversation

aevyrie
Copy link
Member

@aevyrie aevyrie commented Feb 13, 2025

Objective

  • Make transform propagation faster.

Solution

  • Work sharing worker threads
  • Parallel tree traversal excluding leaves
  • Second cache friendly wide pass over all leaves
  • 3-10x faster than main

Testing

  • Tracy
  • Caldera hotel is showing 3-7x faster on my M4 Max. Timing for bevy's existing transform system shifts wildly run to run, so I don't know that I would advertise a particular number. But this implementation is faster in a... statistically significant way.
    image

@aevyrie aevyrie requested a review from Victoronz February 13, 2025 06:36
@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Feb 13, 2025
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times A-Transform Translations, rotations and scales M-Needs-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 13, 2025
@aevyrie aevyrie marked this pull request as draft February 13, 2025 06:49
@aevyrie
Copy link
Member Author

aevyrie commented Feb 13, 2025

Getting this up now that it seems to function, marking as draft because it still needs cleanup.

@alice-i-cecile
Copy link
Member

FYI, I'm merging #17815. The methods there are probably worth benching here.

@alice-i-cecile
Copy link
Member

I'd love a before / after tracy histogram BTW. I've seen enough tests on Discord to be convinced that this is markedly (3x or so) faster, but it would be lovely to show in the release notes.

@superdump superdump requested a review from james7132 February 17, 2025 21:22
Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Overall makes sense to me, but I'm going to give this another pass tomorrow and try and understand the logic a little better.

Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a few questions. Overall thanks a ton for doing this work, it really helps!

@NthTensor
Copy link
Contributor

Alright, I am finally going to be able to give this a look. Right out of the gate I find the increase in run variance concerning, but I'm in favor of merging and leaving that as follow up if we can't easily identify the cause.

@aevyrie
Copy link
Member Author

aevyrie commented Feb 22, 2025

Alright, I am finally going to be able to give this a look. Right out of the gate I find the increase in run variance concerning, but I'm in favor of merging and leaving that as follow up if we can't easily identify the cause.

If you are referring to my comment, that was about bevy's existing system. It swings wildly +- 1ms. Variance (timing distribution) within a run is consistent for both. Variance between runs in this PR seems better than main. My guess is this is caused by E-cores vs P-cores. The old implementation usually ended up spending most time in a single core, whereas the new one is spread across all available cores in the thread pool. I think this explains why inter-run variance has improved, the work is spread across a mix of P and E cores.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 22, 2025
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Feb 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 22, 2025
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Feb 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 23, 2025
@aevyrie
Copy link
Member Author

aevyrie commented Feb 23, 2025

The CI failure was not caused by this PR, seems to be some sort of MESA failure according to discord.

@mockersf mockersf enabled auto-merge February 23, 2025 20:28
@mockersf mockersf added this pull request to the merge queue Feb 23, 2025
Merged via the queue into bevyengine:main with commit dba1f7a Feb 23, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Transform Translations, rotations and scales C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Release-Note Work that should be called out in the blog due to impact S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.