Merge branch-24.10 into main #121

NvTimLiu · 2024-07-28T08:35:56Z

Change version to 24.10.0

Note: merge this PR with Create a merge commit to merge

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <timl@nvidia.com>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Signed-off-by: Zach Puller <zpuller@nvidia.com>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Removed unused import --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <zpuller@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@gmail.com>

…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <gera@apache.org>

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <rjafri@nvidia.com> * fixed the failing shim --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <mithunr@nvidia.com> * Removed unnecessary base class. --------- Signed-off-by: MithunR <mithunr@nvidia.com>

This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>

…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <mithunr@nvidia.com> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <mithunr@nvidia.com> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <mithunr@nvidia.com>

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <timl@nvidia.com>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

Signed-off-by: Peixin Li <pxLi@nyu.edu>

…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <pxLi@nyu.edu> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <pxLi@nyu.edu> --------- Signed-off-by: Peixin Li <pxLi@nyu.edu>

* AnalysisException child class Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

This is a bug fix for the hive write tests. In some of the tests on Spak 351, the ProjectExec will fall back to CPU due to missing the GPU version of the MapFromArrays expression. This PR adds the ProjectExec to the allowed list of fallback for Spark 351 and the laters. Signed-off-by: Firestarman <firestarmanllc@gmail.com>

…s] (NVIDIA#10994) * POM changes for Spark 4.0.0 Signed-off-by: Raza Jafri <rjafri@nvidia.com> * validate buildver and scala versions * more pom changes * fixed the scala-2.12 comment * more fixes for scala-2.13 pom * addressed comments * add in shim check to account for 400 * add 400 for premerge tests against jdk 17 * temporarily remove 400 from snapshotScala213 * fixed 2.13 pom * Remove 400 from jdk17 as it will compile with Scala 2.12 * github workflow changes * added quotes to pom-directory * update version defs to include scala 213 jdk 17 * Cross-compile all shims from JDK17 to JDK8 Eliminate Logging inheritance to prevent shimming of unshimmable API classes Signed-off-by: Gera Shegalov <gera@apache.org> * dummy * undo api pom change Signed-off-by: Gera Shegalov <gera@apache.org> * Add preview1 to the allowed shim versions Signed-off-by: Gera Shegalov <gera@apache.org> * Scala 2.13 to require JDK17 Signed-off-by: Gera Shegalov <gera@apache.org> * Removed unused import left over from razajafri#3 * Setup JAVA_HOME before caching * Only upgrade the Scala plugin for Scala 2.13 * Regenerate Scala 2.13 poms * Remove 330 from JDK17 builds for Scala 2.12 * Revert "Remove 330 from JDK17 builds for Scala 2.12" This reverts commit 1faabd4. * Downgrade scala.plugin.version for cloudera * Updated comment to include the issue * Upgrading the scala.maven.plugin version to 4.9.1 which is the same as Spark 4.0.0 * Downgrade scala-maven-plugin for Cloudera * revert mvn verify changes * Avoid cache for JDK 17 * removed cache dep from scala 213 * Added Scala 2.13 specific checks * Handle the change for UnaryPositive now extending RuntimeReplaceable * Removing 330 from jdk17.buildvers as we only support Scala2.13 and fixing the enviornment variable in version-defs.sh that we read for building against JDK17 with Scala 213 * Update Scala 2.13 poms * fixed scala2.13 verify to actually use the scala2.13/pom.xml * Added missing csv files * Skip Opcode tests There is a bytecode incompatibility which is why we are skipping these until we add support for it. For details please see the following two issues NVIDIA#11174 NVIDIA#10203 * upmerged and fixed the new compile error introduced * addressed review comments * Removed jdk17 cloudera check and moved it inside the 321,330 and 332 cloudera profiles * fixed upmerge conflicts * reverted renaming of id * Fixed HiveGenericUDFShim * addressed review comments * reverted the debugging code * generated Scala 2.13 poms --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com> Signed-off-by: Gera Shegalov <gera@apache.org> Co-authored-by: Gera Shegalov <gera@apache.org>

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

…11197) Signed-off-by: Chong Gao <res_life@163.com>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

To fix issue: NVIDIA#11114 To support Spark 3.3+ and 4.0+ shims, we change to build the Scala2.13 nightly dist jar with JDK17. Signed-off-by: Tim Liu <timl@nvidia.com>

…arquet IDs (NVIDIA#11202) Signed-off-by: Jason Lowe <jlowe@nvidia.com>

…huffleThreadedWriterBase (NVIDIA#11180) * Exclude the processing time in records.hasNext from the serialization time estimation Signed-off-by: Jihoon Son <ghoonson@gmail.com> * Exclude the wait time on limiter * Exclude batch size computing time as well * fix outdated comment; add more comments * Add a function that takes a TimeTrackingIterator * make stuff private --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* from_json invalid data in rapids added Signed-off-by: fejiang <fejiang@nvidia.com> * adding logging message when parsing invalid json Signed-off-by: fejiang <fejiang@nvidia.com> * remove unwanted test Signed-off-by: fejiang <fejiang@nvidia.com> * setting changed that one more exception catch Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * style changed Signed-off-by: fejiang <fejiang@nvidia.com> * Change exception catch logic Signed-off-by: fejiang <fejiang@nvidia.com> * adding new exception class Signed-off-by: fejiang <fejiang@nvidia.com> * removed logging Signed-off-by: fejiang <fejiang@nvidia.com> * removed logging Signed-off-by: fejiang <fejiang@nvidia.com> * line recoverd Signed-off-by: fejiang <fejiang@nvidia.com> --------- Signed-off-by: fejiang <fejiang@nvidia.com>

…DIA#11219) * Fix hash-aggregate tests failing in ANSI mode Fixes NVIDIA#11018. This commit fixes the tests in `hash_aggregate_test.py` to run correctly when run with ANSI enabled. This is essential for running the tests with Spark 4.0, where ANSI mode is on by default. A vast majority of the tests here happen to exercise aggregations like `SUM`, `COUNT`, `AVG`, etc. which fall to CPU, on account of NVIDIA#5114. These tests have been marked with `@disable_ansi_mode`, so that they run to completion correctly. These may be revisited after NVIDIA#5114 has been addressed. In cases where NVIDIA#5114 does not apply, the tests have been modified to run with ANSI on and off. --------- Signed-off-by: MithunR <mithunr@nvidia.com>

Support MapFromArrays on GPU --------- Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>

…ted. [databricks] (NVIDIA#11129) Fixes NVIDIA#11031. This PR addresses tests that fail on Spark 4.0 in the following files: 1. `integration_tests/src/main/python/datasourcev2_read_test.py` 2. `integration_tests/src/main/python/expand_exec_test.py` 3. `integration_tests/src/main/python/get_json_test.py` 4. `integration_tests/src/main/python/hive_delimited_text_test.py` 5. `integration_tests/src/main/python/logic_test.py` 6. `integration_tests/src/main/python/repart_test.py` 7. `integration_tests/src/main/python/time_window_test.py` 8. `integration_tests/src/main/python/json_matrix_test.py` 9. `integration_tests/src/main/python/misc_expr_test.py` 10. `integration_tests/src/main/python/orc_write_test.py` Signed-off-by: MithunR <mithunr@nvidia.com>

…-11212 Fix auto merge conflict 11212

…s] (NVIDIA#11220) * Avoid hit spark bug SPARK-44242 while generate run_dir Signed-off-by: Peixin Li <pxLi@nyu.edu> * Update integration_tests/run_pyspark_from_build.sh apply suggestion Co-authored-by: Jason Lowe <jlowe@nvidia.com> --------- Signed-off-by: Peixin Li <pxLi@nyu.edu> Co-authored-by: Jason Lowe <jlowe@nvidia.com>

…IDIA#11230) Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* add cache dependencies step for scala 213 Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * add populate script Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * move yml Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * fix error of script shell Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * hardcode buildvers Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Update .github/workflows/mvn-verify-check.yml to differentiate the cache key cleart Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * fix nit Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> --------- Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> Co-authored-by: Peixin <pxli@nyu.edu> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

Maybe addresses NVIDIA#11225 Signed-off-by: Gera Shegalov <gera@apache.org>

* Remove the unused var CUDF_VER from the CI script Signed-off-by: Tim Liu <timl@nvidia.com> * Update for the review comment Signed-off-by: Tim Liu <timl@nvidia.com> --------- Signed-off-by: Tim Liu <timl@nvidia.com>

* clear the regex logic Signed-off-by: fejiang <fejiang@nvidia.com> * local change of substring index Signed-off-by: fejiang <fejiang@nvidia.com> * stringFunctions scala Signed-off-by: fejiang <fejiang@nvidia.com> * stringFunctions import conflict resolved Signed-off-by: fejiang <fejiang@nvidia.com> * doColumnar calling Signed-off-by: fejiang <fejiang@nvidia.com> * delimiter change to scalar type Signed-off-by: fejiang <fejiang@nvidia.com> * delimiter changed to scalar type Signed-off-by: fejiang <fejiang@nvidia.com> * changed delimiter type Signed-off-by: fejiang <fejiang@nvidia.com> * comment removed Signed-off-by: fejiang <fejiang@nvidia.com> * unwanted test case Signed-off-by: fejiang <fejiang@nvidia.com> * IT test added Signed-off-by: fejiang <fejiang@nvidia.com> * remove RapidExpressionsSuite Signed-off-by: fejiang <fejiang@nvidia.com> * adding evaluating logic when using GpuScalar Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * remove the single delim note in gpuoverride Signed-off-by: fejiang <fejiang@nvidia.com> * doc generated Signed-off-by: fejiang <fejiang@nvidia.com> --------- Signed-off-by: fejiang <fejiang@nvidia.com>

) Signed-off-by: Tim Liu <timl@nvidia.com>

* Test Signed-off-by: Gera Shegalov <gera@apache.org> * reviews: fix typo Signed-off-by: Gera Shegalov <gera@apache.org> --------- Signed-off-by: Gera Shegalov <gera@apache.org>

Keep rapids JNI and private dependency version util the nightly CI for the branch branch-24.10 is done.. Track the dependency update by: https://gitlab-master.nvidia.com/timl/spark-rapids-private/-/issues/14 Signed-off-by: NVTIMLIU <70000568+nvauto@users.noreply.github.com>

Signed-off-by: NVTIMLIU <70000568+nvauto@users.noreply.github.com>

NvTimLiu and others added 30 commits May 22, 2024 23:06

Init version 24.08.0-SNAPSHOT

f9076a0

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <timl@nvidia.com>

Merge pull request NVIDIA#10879 from NVIDIA/branch-24.06

02a70d4

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10883 from NVIDIA/branch-24.06

0df3d05

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10885 from NVIDIA/branch-24.06

800ca6b

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10888 from NVIDIA/branch-24.06

ec9221f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10926 from NVIDIA/branch-24.06

8a13793

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10927 from NVIDIA/branch-24.06

4e4be54

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

append zpuller to authorized user of blossom-ci (NVIDIA#10929)

02f4595

Signed-off-by: Zach Puller <zpuller@nvidia.com>

Merge pull request NVIDIA#10932 from NVIDIA/branch-24.06

2e8d43f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10935 from NVIDIA/branch-24.06

2dce03d

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10936 from NVIDIA/branch-24.06

69cca07

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10937 from NVIDIA/branch-24.06

6086cac

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10939 from NVIDIA/branch-24.06

35b1575

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Fixed Databricks build [databricks] (NVIDIA#10933)

f0b13ed

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Removed unused import --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Added Shim for BatchScanExec to Support Spark 4.0 [databricks] (NVIDI…

a7cdaa9

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <rjafri@nvidia.com> * fixed the failing shim --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Change dependency version to 24.08.0-SNAPSHOT (NVIDIA#10949)

2a86bb5

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <timl@nvidia.com>

Merge pull request NVIDIA#10954 from NVIDIA/branch-24.06

bbdcac0

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

fix build errors for 4.0 shim (NVIDIA#10952)

1be42d4

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

Add new blossom-ci allowed user (NVIDIA#10959)

5750ace

Signed-off-by: Peixin Li <pxLi@nyu.edu>

Move Support for RaiseError to a Shim Excluding Spark 4.0.0 [databr…

3111e2b

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri and others added 27 commits July 16, 2024 07:21

Fix dynamic pruning regression in GpuFileSourceScanExec (NVIDIA#11191)

2b09372

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Fix class not found error: com/nvidia/spark/rapids/GpuScalar (NVIDIA#…

5448ad1

…11197) Signed-off-by: Chong Gao <res_life@163.com>

Merge pull request NVIDIA#11206 from NVIDIA/branch-24.06

0a03050

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Build the Scala2.13 dist jar with JDK17 (NVIDIA#11144)

c7f6e96

To fix issue: NVIDIA#11114 To support Spark 3.3+ and 4.0+ shims, we change to build the Scala2.13 nightly dist jar with JDK17. Signed-off-by: Tim Liu <timl@nvidia.com>

Fix multi-release jar problem (NVIDIA#11185)

6d7d4df

Fix read from Delta Lake table with name column mapping and missing P…

8085027

…arquet IDs (NVIDIA#11202) Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Fix ANSI mode test failures in url_test.py (NVIDIA#11194)

7e899a0

Merge remote-tracking branch 'branch-24.06' into branch-24.08

bd503ef

Support MapFromArrays on GPU [databricks] (NVIDIA#11163)

7027304

Support MapFromArrays on GPU --------- Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>

Merge pull request NVIDIA#11226 from NvTimLiu/fix-auto-merge-conflict…

b83c0e6

…-11212 Fix auto merge conflict 11212

Skip test where Delta Lake may not be fully compatible with Spark (NV…

daeaa3c

…IDIA#11230) Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Multi-get_json_object [databricks] (NVIDIA#11200)

5be4bd5

Fork jvm for maven-source-plugin (NVIDIA#11237)

46ca23a

Maybe addresses NVIDIA#11225 Signed-off-by: Gera Shegalov <gera@apache.org>

Remove the unused vars from the version-def CI script (NVIDIA#11236)

8bc4712

* Remove the unused var CUDF_VER from the CI script Signed-off-by: Tim Liu <timl@nvidia.com> * Update for the review comment Signed-off-by: Tim Liu <timl@nvidia.com> --------- Signed-off-by: Tim Liu <timl@nvidia.com>

Auto merge PRs to branch-24.10 from branch-24.08 [skip ci] (NVIDIA#11241

d2b0514

) Signed-off-by: Tim Liu <timl@nvidia.com>

Rework Maven Source Plugin Skip (NVIDIA#11246)

4a01c1a

* Test Signed-off-by: Gera Shegalov <gera@apache.org> * reviews: fix typo Signed-off-by: Gera Shegalov <gera@apache.org> --------- Signed-off-by: Gera Shegalov <gera@apache.org>

Merge branch-24.10 into main

10ee13b

Change version to 24.10.0

6feb7a8

Signed-off-by: NVTIMLIU <70000568+nvauto@users.noreply.github.com>

NvTimLiu deleted the branch main July 28, 2024 08:40

NvTimLiu closed this Jul 28, 2024

NvTimLiu deleted the merge-branch-24.10-to-main branch July 28, 2024 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge branch-24.10 into main #121

Merge branch-24.10 into main #121

NvTimLiu commented Jul 28, 2024

Merge branch-24.10 into main #121

Merge branch-24.10 into main #121

Conversation

NvTimLiu commented Jul 28, 2024