Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join side aliases should be optional #862

Merged
merged 4 commits into from
Nov 6, 2024

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Nov 1, 2024

Description

Imaging a case as following: Assume table1, table2, and table3 all contain a column id.

select
  *
from
  table1 t1,
  table2 t2,
  table3 t3
where
 t1.id = t2.id
 and t1.id = t3.id

To rewrite above SQL query to PPL query, we will get

source = table1
| join left = t1 right = t2 ON t1.id = t2.id table2
| join left = l1 right = t3 ON t1.id = t3.id table3 // <------ issue here! 

The PPL query throws an exception with message:

t1.id cannot be resolved, Did you mean one of the following? [l1.id, l1.id, t3.id].

It because the required left side alias l1 overrides the table alias t1 and t2.

Its logical plan looks

'Project [*]
+- 'Filter (('t1.id = 't2.id) AND ('t1.id = 't3.id)) <------ t1.id cannot be resolved
   +- 'Join Inner
      :- 'SubqueryAlias l1  <------ issue root cause
      :  +- 'Join Inner
      :    :- 'SubqueryAlias t1
      :    :  +- 'UnresolvedRelation [table1], [], false
      :    +- 'SubqueryAlias t2
      :       +- 'UnresolvedRelation [table2], [], false
      +- 'SubqueryAlias t3
         +- 'UnresolvedRelation [table3], [], false

Check #857 for details

Related Issues

Resolves #857

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
}

/** Optional alias name for the relation. */
@Setter @Getter private String alias;
Copy link
Member Author

@LantaoJin LantaoJin Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alias in Relation could be removed since we have the SubqueryAlias in AST: https://github.com/opensearch-project/opensearch-spark/blob/main/ppl-spark-integration/src/main/java/org/opensearch/sql/ast/tree/SubqueryAlias.java

Remaining this member causes potential issue if conflicts. All table with alias in PPL should add a SubqueryAlias such as #862 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin can u give an example for a union query ?
source = a as table_a, b as table_b, c as table_c | ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin can u give an example for a union query ? source = a as table_a, b as table_b, c as table_c | ...

No, we didn't support this pattern, see

: tableSource (COMMA tableSource)* (AS alias = qualifiedName)?

But we supported the pattern:
source = table1, table2, table3 as t | where t.name = 'Molly'
The plan of above is

'Union true, true
:- 'Project [*]
:  +- 'Filter ('t.name = Molly)
:     +- 'SubqueryAlias t
:        +- 'UnresolvedRelation [table1], [], false
:- 'Project [*]
:  +- 'Filter ('t.name = Molly)
:     +- 'SubqueryAlias t
:        +- 'UnresolvedRelation [table2], [], false
+- 'Project [*]
   +- 'Filter ('t.name = Molly)
      +- 'SubqueryAlias t
         +- 'UnresolvedRelation [table3], [], false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add an UT and IT of the above case in latest commit.

*
* @return table name
*/
public List<String> getTableName() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never used.

Comment on lines +274 to +276
| | inner JOIN left = l right = r ON l.id = r.id $testTable2
| | left JOIN left = l right = r ON l.name = r.name $testTable3
| | cross JOIN left = l right = r $testTable4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma is optional, I am trying to reduce this usage. But keep the comma won't fail query.

? new Relation(ctx.tableSource().stream().map(this::internalVisitExpression).collect(Collectors.toList()))
: new Relation(ctx.tableSource().stream().map(this::internalVisitExpression).collect(Collectors.toList()), ctx.alias.getText());
Relation relation = new Relation(ctx.tableSource().stream().map(this::internalVisitExpression).collect(Collectors.toList()));
return ctx.alias != null ? new SubqueryAlias(ctx.alias.getText(), relation) : relation;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add SubqueryAlias ast node instead of adding the alias string into Relation.

*
* @return table name
*/
public List<String> getTableName() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never used.

@LantaoJin
Copy link
Member Author

LantaoJin commented Nov 1, 2024

Without this fixing, q21 in TPCH couldn't be rewritten to PPL. Link #830

@LantaoJin LantaoJin marked this pull request as ready for review November 1, 2024 10:14
@LantaoJin LantaoJin added 0.6 Lang:PPL Pipe Processing Language support labels Nov 1, 2024
@ToString
@Getter
@EqualsAndHashCode(callSuper = false)
@RequiredArgsConstructor
public class Relation extends UnresolvedPlan {
private static final String COMMA = ",";

private final List<UnresolvedExpression> tableName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin lates change the name to indicate it may contain multi-part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

/** Optional alias name for the relation. */
@Setter @Getter private String alias;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin can u give an example for a union query ?
source = a as table_a, b as table_b, c as table_c | ...

public Expression resolveJoinCondition(
UnresolvedExpression expr,
BiFunction<UnresolvedExpression, CatalystPlanContext, Expression> transformFunction) {
isResolvingJoinCondition = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u make this an atomic bool ?

Copy link
Member Author

@LantaoJin LantaoJin Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalystPlanContext is generated by each query. All members in CatalystPlanContext should be thread-safe. There is no multi-threads need to check and change the boolean and other structures of CatalystPlanContext.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add this note on the CatalystPlanContext class description for better clarity

UnresolvedPlan rightRelation = visit(ctx.tableOrSubqueryClause());
UnresolvedPlan right = new SubqueryAlias(rightAlias, rightRelation);
UnresolvedPlan right;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some comment explaining the condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@YANG-DB
Copy link
Member

YANG-DB commented Nov 5, 2024

@LantaoJin also plz resolve conflicts...

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin merged commit cfd41a3 into opensearch-project:main Nov 6, 2024
4 checks passed
@LantaoJin
Copy link
Member Author

Conflicts is related to imports. CI tests succeed with the related files. Merging to main.

kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
* Join side aliases should be optional

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Side aliases in JOIN command should be optional
2 participants