Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling in controller #14951

Closed
wants to merge 3 commits into from

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Jan 30, 2025

This PR is trying to fix the two first controller errors listed in #14950:

  • On errors, Pinot controller query endpoints inconsistently return broker-like JSON error payloads (if the error is detected by the broker) or plain text (if the controller itself detected the error)
  • The Pinot controller detected errors that were not logged.

It also optimizes the way controllers pipeline query results from brokers. The current version should be faster (see https://schoeffm.github.io/posts/response-streaming-between-jaxrs-and-webcomponents-part1/) and, most importantly, zero-copy (while previous code copied the query response on the heap not once but twice!!!)

Introduce a new PinotRuntimeException and modify other SPI exceptions to extend from it.
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 31.93277% with 81 lines in your changes missing coverage. Please review.

Project coverage is 63.71%. Comparing base (59551e4) to head (1fc53f3).
Report is 1652 commits behind head on master.

Files with missing lines Patch % Lines
...t/controller/api/resources/PinotQueryResource.java 20.27% 58 Missing and 1 partial ⚠️
...ava/org/apache/pinot/spi/exception/QException.java 40.00% 15 Missing ⚠️
.../pinot/spi/exception/BadQueryRequestException.java 22.22% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14951      +/-   ##
============================================
+ Coverage     61.75%   63.71%   +1.96%     
- Complexity      207     1469    +1262     
============================================
  Files          2436     2712     +276     
  Lines        133233   151943   +18710     
  Branches      20636    23463    +2827     
============================================
+ Hits          82274    96807   +14533     
- Misses        44911    47865    +2954     
- Partials       6048     7271    +1223     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.68% <31.93%> (+1.97%) ⬆️
java-21 63.58% <31.93%> (+1.96%) ⬆️
skip-bytebuffers-false 63.69% <31.93%> (+1.94%) ⬆️
skip-bytebuffers-true 63.57% <31.93%> (+35.84%) ⬆️
temurin 63.71% <31.93%> (+1.96%) ⬆️
unittests 63.70% <31.93%> (+1.96%) ⬆️
unittests1 56.21% <31.11%> (+9.32%) ⬆️
unittests2 34.03% <25.21%> (+6.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

public BadQueryRequestException(String message) {
super(message);
super(SQL_RUNTIME_ERROR_CODE, message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems to be used for user errors - why do we want to use SQL_RUNTIME_ERROR_CODE for that?

Copy link
Contributor Author

@gortiz gortiz Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is used in tons of places to detect errors in runtime. There are places where it is being caught and reconverted into a different error type depending on the context where it was fired. For example BaseSingleBlockCombineOperator.

Here, I'm assigning a default error code, which, in general, is something difficult to do in a precise manner. Callers can anyway include an explicit error code

Comment on lines +232 to +239
} catch (QException e) {
if (e.getErrorCode() != QueryException.UNKNOWN_ERROR_CODE) {
throw e;
} else {
throw new QException(QException.SQL_PARSING_ERROR_CODE, e);
}
} catch (Exception e) {
return QueryException.getException(QueryException.SQL_PARSING_ERROR,
new Exception("Unable to find table for this query", e)).toString();
throw new QException(QException.SQL_PARSING_ERROR_CODE, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a little confusing - why are we treating unknown error as parsing error but not handling other types of QException here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we assume other error types will be more precise. The error codes being used are pretty chaotic. It looks like we added them without actual consideration and the fact that we cannot create hierarchies is not a good design. For example, is a UNKNOWN_COLUMN_ERROR_CODE a QUERY_VALIDATION_ERROR_CODE?

@gortiz
Copy link
Contributor Author

gortiz commented Feb 5, 2025

@yashmayya I've created #14994, which is built on top of this PR and fixes most of the changes reported here

@gortiz
Copy link
Contributor Author

gortiz commented Mar 19, 2025

This has been superseded by #15277

@gortiz gortiz closed this Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants