Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.04 into main [skip ci] #46

Closed
wants to merge 101 commits into from
Closed

Conversation

NvTimLiu
Copy link
Owner

Change version to 24.04.0

Note: merge this PR with Create a merge commit to merge

nvauto and others added 30 commits January 24, 2024 17:20
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
Signed-off-by: Tim Liu <timl@nvidia.com>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
To fix: NVIDIA#10256

Bump up dependency version to 24.04.0-SNAPSHOT

Signed-off-by: Tim Liu <timl@nvidia.com>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
revans2 and others added 28 commits February 28, 2024 08:27
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
…#10466)

* remove leading space for json path in GetJsonObject

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Update comments

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Use JsonPathParser to normalize path

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Update compatibility doc

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* clean up

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Fallback json paths containing  in GetJsonObject

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* cache normalizeJsonPath and prevent memory leak

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* clean up

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* ready to merge

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Use parser to check whether to fallback

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Add a special case

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

---------

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Jim Brennan <jimb@nvidia.com>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
* Move 351 shims into noSnapshot buildvers

Move 351 shims into noSnapshot buildvers as spark has release it.

Follow up of NVIDIA#10465 (comment)

Signed-off-by: Tim Liu <timl@nvidia.com>

* 351 shim for scala 2.13

Signed-off-by: Tim Liu <timl@nvidia.com>

---------

Signed-off-by: Tim Liu <timl@nvidia.com>
…0500)

Fixes NVIDIA#8208.

This commit adds support for `WindowGroupLimitExec` to run on GPU.  This optimization was added in Apache Spark 3.5, to reduce the number of rows that participate in shuffles, for queries that contain filters on the result of ranking functions. For example:

```sql
SELECT foo, bar FROM (
  SELECT foo, bar, 
         RANK() OVER (PARTITION BY foo ORDER BY bar) AS rnk
  FROM mytable )
WHERE rnk < 10
```

Such a query would require a shuffle to bring all rows in a window-group to be made available in the same task.
In Spark 3.5, an optimization was added in [SPARK-37099](https://issues.apache.org/jira/browse/SPARK-37099) to take advantage of the `rnk < 10` predicate to reduce shuffle load.
Specifically, since only 9 (i.e. 10-1) ranks participate in the window function, only those many rows need be shuffled into the task, per input batch.  By pre-filtering rows that can't possibly satisfy the condition, the number of shuffled records can be reduced.

The GPU implementation (i.e. `GpuWindowGroupLimitExec`) differs slightly from the CPU implementation, because it needs to execute on the entire input column batch.  As a result, `GpuWindowGroupLimitExec` runs the rank scan on each input batch, and then filters out ranks that exceed the limit specified in the predicate (`rnk < 10`). After the shuffle, the `RANK()` is calculated again by `GpuRunningWindowExec`, to produce the final result.

The current implementation addresses `RANK()` and `DENSE_RANK` window functions.  Other ranking functions (like `ROW_NUMBER()`) can be added at a later date.

Signed-off-by: MithunR <mythrocks@gmail.com>
This PR adds a new metric for the preprojection in GpuExand.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
…ory oom work (NVIDIA#10519)

Signed-off-by: Jim Brennan <jimb@nvidia.com>
* Update rapids jni and private dependency version to 24.02.1 (NVIDIA#10511)

Signed-off-by: Tim Liu <timl@nvidia.com>

* Add missed shims for scala2.13 (NVIDIA#10465)

* Add missed shims for scala2.13

Signed-off-by: Tim Liu <timl@nvidia.com>

* Add 351 snapshot shim for the scala2.13 version of plugin jar

Signed-off-by: Tim Liu <timl@nvidia.com>

* Remove 351 snapshot shim as spark 3.5.1 has been released

Signed-off-by: Tim Liu <timl@nvidia.com>

* Remove scala2.13 351 snapshot shim

Signed-off-by: Tim Liu <timl@nvidia.com>

* Remove 351 shim's jason string

Ran `mvn generate-sources -Dshimplify=true -Dshimplify.move=true -Dshimplify.remove.shim=351`

to remove 351 shim's jason string, and fix some unnecessary empty lines that were introduced

Signed-off-by: Tim Liu <timl@nvidia.com>

* Update Copyright 2024

Auto copyright by below scripts
```
export SPARK_RAPIDS_AUTO_COPYRIGHTER=ON

./scripts/auto-copyrighter.sh $(git diff --name-only origin/branch-24.04..HEAD)
```

Signed-off-by: Tim Liu <timl@nvidia.com>

* Revert "Update Copyright 2024"

This reverts commit 8482847.

* Revert "Remove 351 shim's jason string"

This reverts commit 78d1f00.

* skip 351 from strict checking

* Alien scala2.13/pom.xml to scala2.12 one

Run the script `bash build/make-scala-version-build-files.sh 2.13`

Signed-off-by: Tim Liu <timl@nvidia.com>

* pretend 351 is a snapshot in 24.02

Signed-off-by: Gera Shegalov <gera@apache.org>

* pretend 351 is a SNAPSHOT version

* Revert change of build/shimplify.py

Signed-off-by: Tim Liu <timl@nvidia.com>

---------

Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Raza Jafri <rjafri@nvidia.com>
Co-authored-by: Gera Shegalov <gera@apache.org>

* Update changelog for v24.02.0 release (NVIDIA#10525)

Signed-off-by: Tim Liu <timl@nvidia.com>

---------

Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Raza Jafri <rjafri@nvidia.com>
Co-authored-by: Gera Shegalov <gera@apache.org>
)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Co-authored-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Fix merge conflict from branch-24.02
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Update to latest branch-24.02 [skip ci]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
* Distinct inner join

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Distinct left join

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Update to new API

* Fix test

---------

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
…ment (NVIDIA#10564)

* WIP

Signed-off-by: Gera Shegalov <gera@apache.org>

* WIP

Signed-off-by: Gera Shegalov <gera@apache.org>

* Enable specifying the pytest using file_or_dir args

```bash
TEST_PARALLEL=0 \
SPARK_HOME=~/dist/spark-3.1.1-bin-hadoop3.2 \
TEST_FILE_OR_DIR=~/gits/NVIDIA/spark-rapids/integration_tests/src/main/python/arithmetic_ops_test.py::test_addition  \
./integration_tests/run_pyspark_from_build.sh --collect-only

<Module src/main/python/arithmetic_ops_test.py>
  <Function test_addition[Byte]>
  <Function test_addition[Short]>
  <Function test_addition[Integer]>
  <Function test_addition[Long]>
  <Function test_addition[Float]>
  <Function test_addition[Double]>
  <Function test_addition[Decimal(7,3)]>
  <Function test_addition[Decimal(12,2)]>
  <Function test_addition[Decimal(18,0)]>
  <Function test_addition[Decimal(20,2)]>
  <Function test_addition[Decimal(30,2)]>
  <Function test_addition[Decimal(36,5)]>
  <Function test_addition[Decimal(38,10)]>
  <Function test_addition[Decimal(38,0)]>
  <Function test_addition[Decimal(7,7)]>
  <Function test_addition[Decimal(7,-3)]>
  <Function test_addition[Decimal(36,-5)]>
  <Function test_addition[Decimal(38,-10)]>
```

Signed-off-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Raza Jafri <rjafri@nvidia.com>

* Changing to TESTS=module::method

Signed-off-by: Gera Shegalov <gera@apache.org>

---------

Signed-off-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Raza Jafri <rjafri@nvidia.com>
…VIDIA#10562)

* Fix test_spark_from_json_date_with_format when run in a non-UTC TZ

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Copyright year

---------

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
…0542)

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Co-authored-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: jenkins <jenkins@localhost>
@NvTimLiu NvTimLiu closed this Mar 19, 2024
@NvTimLiu NvTimLiu deleted the branch-24.04-to-main branch March 19, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.