Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

formatter will stack overflow when formatting a long chain of or #4712

Closed
onelson opened this issue Apr 29, 2022 · 5 comments · Fixed by #4876
Closed

formatter will stack overflow when formatting a long chain of or #4712

onelson opened this issue Apr 29, 2022 · 5 comments · Fixed by #4876

Comments

@onelson
Copy link
Contributor

onelson commented Apr 29, 2022

TL;DR: long or chains can cause the formatter to stack overflow. To mitigate, either we need to reduce allocations or make the flow less recursive.


At time of writing, given a flux program like:

x = false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false or false

will cause fluxcore::formatter::format() to stack overflow. By my count, today's threshold is around 193 or being the breaking point.

Reducing Allocations

There are a few places where we could reduce allocations, for example in this specific case types like ast::LogicalOperator are cast to strings at each level of recursion.

We could add an impl for AsRef<str> which returns the &'static str instead. Better might be to use only the enum variants themselves rather than dropping down to String/&str, but there might be a polymorphic concern -- are two or more enums we feed into this function?

These strings are going to be a small savings, but not really enough to make an impact. A POC to move towards using &'static str for this got us a single additional level of recursion only.

Refactoring to reduce recursion

This is more likely the way we want to go if we really want to mitigate. Ideally, flux (the formatter, and the parser) should be able to happily consume programs that perhaps are rejected by limits downstream (such as in storage). We would prefer to not be the limiting factor.

In order to avoid the stack overflow in this situation, we'd need to refactor the flow to be more flat with plain iteration rather than recursion.


Probably need to discuss design options for refactors if we want to tackle this. DOD will depend on the discussion/proposal.

@onelson
Copy link
Contributor Author

onelson commented May 9, 2022

I just read https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html which describes the Pratt parsing algorithm.

Pratt combines recursive and looping flows to help avoid some of the weaknesses of recursive descent style parsers. I'm not entirely sure the weaknesses cited are "too much recursion" as is our problem case today, but certainly switching to loops where possible should help.

I know the formatter is not actually parsing anything (it only consumes an already parsed AST), but it seems like a similar approach could be applied.

I have not yet read https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ but it's next on my list.

@onelson
Copy link
Contributor Author

onelson commented May 9, 2022

Discussed as a quicker-fix in the standup today: we could add a depth counter to the recursion which would allow us to produce a user-facing error that is phrased deliberately for this case.

It might be we also want to survey for other recursive paths at the same time.

N.b. the stack limits are different (lower) for wasm so we should probably use its limits as our ceiling.

@onelson
Copy link
Contributor Author

onelson commented May 9, 2022

Looks like #4721 ran with the idea of cutting those operator allocations!

@nathanielc
Copy link
Contributor

nathanielc commented May 12, 2022

NOTE: If we add a limit we should also support pushdown of contains. This way we can point people at using contains to avoid long chains of ors. See #3546. For this we need to know what limit InfluxQL has because we have to rewrite contains as a chain of or in the storage predicate.

@Marwes
Copy link
Contributor

Marwes commented Jun 14, 2022

Ran into this when working #4776 and to avoid having having my program crash when running into queries exhibiting this I added an error for deeply nested expressions. Extracted it in #4876 if we want to merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants