-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of memoized validator functions. #122
Conversation
The cache key was based on a hash of the entire graph. This hash was being generated each time a link was validated. On my system, validating each link took around 100ms with a graph of around 2000 links. With this change it appears that validating each link takes less than 1ms.
Hi, Lucas - thanks for all these PRs! Sorry for not responding earlier - I was afk for the past month or so. I'll go through these now. I'd love to see your benchmarks and maybe add them to the repo - would that code be easy for you to pull up? |
You're welcome. I've been testing end-to-end by adding console.log statements in the code and then running a scenario. Would that test code setup be something you are interested in? I could probably come up with some unit tests as well. |
vitest has built-in benchmarking support using tinybench - could be cool to simulate some of these scenarios within the test suite as a way of documenting these performance improvements, and also providing a baseline for further optimization work. |
Sure, I can see what I can do |
I just added a benchmark. How does that look to you? Before PR:
After PR:
|
It looks like some tests are failing due to the mutable nature of the graph. For example, in this test: auth/packages/crdx/src/validator/test/validate.test.ts Lines 49 to 50 in 6ab7e9c
We modify the root entry, but the entry's hash field doesn't change. So given this, the memoization changes in this PR won't work. One option is to remove memoization on validation. It looks like validation is only used when creating the Team and then when receiving a message:
And upon receiving a message, if the graph changes at all, the original memoization approach isn't going to help because
Another option is to keep the original memoization, but compute the hash of the graph only once before we call the individual validators:
So that we are not computing the hash of the graph when validating each link. In this scenario we would still need to compute the hash of each link. But if the graph ever changes because of a new message, from what I can tell, the memoization will not help and will increase memory usage. If you have any thoughts on the direction you'd like to go, I'm happy to code something up. |
Yeah - I've done some experiments and I think we should memoize the |
can you set this PR to allow changes by maintainers? |
Sounds good. Unfortunately, since this PR is using the |
The memoize cache key was based on a hash of the entire graph. This hash was being generated each time a link was validated. On my system, validating each link took around 100ms with a graph of around 2000 links. With this change it appears that validating each link takes less than 1ms.
Of course, we don't want cache key collisions. Do you think using
link.hash
andgraph.root
should be unique enough for the expected use cases?