-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL fieldsummary
command
#766
PPL fieldsummary
command
#766
Conversation
- antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <yang.db.dev@gmail.com>
- antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <yang.db.dev@gmail.com>
fix scala style format Signed-off-by: YANGDB <yang.db.dev@gmail.com>
# Conflicts: # ppl-spark-integration/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
…ng table identifier only has 2 parts) Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
# Conflicts: # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Would you mind change the PR status to DRAFT if you are going to refactor? |
@LantaoJin plz review |
@ToString | ||
@RequiredArgsConstructor | ||
@EqualsAndHashCode(callSuper = false) | ||
public class NamedExpression extends UnresolvedExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this definition or find an alternative, for example reusing Argument
? Because
- NamedExpression should have a name
- NamedExpression generally should be an abstract class and the parent of Attribute or Alias. We should refactor many codes if we really need it.
- It's confused me with Spark NamedExpression, it is not worth to introduce a new expression for fieldsummary command IMO. At least, not
NamedExpression
@@ -39,7 +42,8 @@ | |||
*/ | |||
public interface DataTypeTransformer { | |||
static <T> Seq<T> seq(T... elements) { | |||
return seq(List.of(elements)); | |||
return seq(Arrays.stream(elements).filter(Objects::nonNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment for this changes? What case did you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes its not relevant any more - thanks for pointing it out
SUM(FunctionName.of("sum")), | ||
COUNT(FunctionName.of("count")), | ||
COUNT_DISTINCT(FunctionName.of("count_distinct")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a DISTINCT_COUNT. (did we miss it here?) And seems you are not use it as a built-in function name in following codes, instead, count_distinct
is an alias name.
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
FIELDSUMMARY: 'FIELDSUMMARY'; | ||
INCLUDEFIELDS: 'INCLUDEFIELDS'; | ||
NULLS: 'NULLS'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add these keywords to keywordsCanBeId
too.
|
||
os> source = t | fieldsummary includefields= id, status_code, request_path nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
os> source = t | where status_code != 200 | fieldsummary includefields= status_code nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiels
-> Fields
?
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
* add support for FieldSummary - antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <yang.db.dev@gmail.com> * add support for FieldSummary - antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update sample query fix scala style format Signed-off-by: YANGDB <yang.db.dev@gmail.com> * support spark prior to 3.5 with its extended table identifier (existing table identifier only has 2 parts) Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update union queries based summary Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update scala fmt style Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update scala fmt style Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update query with where clause predicate Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update command and remove the topvalues Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update command docs Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update with comments feedback Signed-off-by: YANGDB <yang.db.dev@gmail.com> * update `FIELD SUMMARY` symbols to the keywordsCanBeId bag of words Signed-off-by: YANGDB <yang.db.dev@gmail.com> --------- Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Description
This PR implements the
fieldsummary
PPL command.Issues Resolved
#662
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.