Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajopt Profiling #47

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from
Draft

Trajopt Profiling #47

wants to merge 44 commits into from

Conversation

alberthli
Copy link
Contributor

@alberthli alberthli commented Dec 5, 2023

Tips on profiling

  • update README with some profiling examples
  • add a benchmark script for simple predictive sampling example
  • add tensorboard-related dependencies to pyproject.toml

Profiling results:
image

Takeaways from the above results:

  • the solver takes about 52% of the total runtime.
    • About 19% of the total runtime is spent on 4 line searches per solver iteration. The biggest "bang for buck" in terms of speeding up step is simply to use fewer solver iterations and line search iterations.
    • About 5.9% of the total runtime is in Cholesky factorizations and solves.
  • the _position subroutine of the forward dynamics takes about 26% of the total runtime!
    • About 4.5% of the total runtime is in this line, which is an example of inefficient multiplications of structured matrices.
  • the _acceleration subroutine takes about 11% of the total runtime
  • About 11% of the total runtime is in semi-implicit Euler integration.

Essentially 100% of the runtime of forward is captured by the above operations, which should give some insight into code optimization prioritizes on the mjx side if we want to give some feedback. We should remember that the contact engine is DISABLED for this benchmark.

@alberthli alberthli changed the base branch from main to trajopt-api December 5, 2023 04:18
@alberthli alberthli marked this pull request as ready for review December 5, 2023 04:24
@alberthli alberthli requested a review from pculbertson December 5, 2023 04:25
@alberthli alberthli marked this pull request as draft December 5, 2023 04:44
@alberthli alberthli marked this pull request as ready for review December 5, 2023 05:02
@alberthli alberthli marked this pull request as draft December 5, 2023 05:18
@alberthli alberthli marked this pull request as ready for review December 5, 2023 05:18
@alberthli alberthli marked this pull request as draft December 5, 2023 05:22
@alberthli alberthli marked this pull request as ready for review December 5, 2023 05:23
@alberthli alberthli marked this pull request as draft December 5, 2023 05:36
@alberthli alberthli marked this pull request as ready for review December 5, 2023 05:36
@alberthli alberthli marked this pull request as draft December 5, 2023 05:43
"torch>=1.13.1",
"tensorboard>=2.15.1",
# "torch>=1.13.1",
"tensorboard>=2.13.0", # [Dec. 4, 2023] https://github.com/tensorflow/tensorflow/issues/62075#issuecomment-1808652131
Copy link
Contributor Author

@alberthli alberthli Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NOTE] As it turns out, if I pin the version to 2.13.0, the code check workflow fails. However, we know that things are broken with the most recent tensorflow version, so we should probably not merge this yet.

Comment on lines +50 to +51
"tensorflow>=2.13.0", # [Dec. 4, 2023] https://github.com/tensorflow/tensorflow/issues/62075#issuecomment-1808652131
"tensorboard-plugin-profile>=2.13.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about pinning versions.

@alberthli alberthli changed the base branch from main to trajopt-api December 5, 2023 07:19
Copy link
Collaborator

@pculbertson pculbertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall -- have a couple ideas for minor improvements to documentation but we can do this later. Bigger issue is the compatibility stuff you already noted -- not sure how to fix this but would be nice to have profiling functional eventually

fn1()

# stuff we do want to specifically profile
with jax.named_scope("name_of_your_choice"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick q: I know TraceAnnotation didn't work here but it's maybe good to note this here since the jax profiling tool recommends it

```
tensorboard --logdir=/dir/to/profiling/results --port <port>
```
where `--port` should be some open port like `8008`. In the top right dropdown menu which should say "Inactive," scroll down and select "Profile." Select the run you'd like to analyze and under tools, the most useful tab will usually be "trace_viewer."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider adding some screenshots for folks who are unfamiliar with flame charts

Base automatically changed from trajopt-api to main December 5, 2023 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants