forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-24.10 into main #121
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <timl@nvidia.com>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
Signed-off-by: Zach Puller <zpuller@nvidia.com>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Removed unused import --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <zpuller@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@gmail.com>
…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <gera@apache.org>
…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <rjafri@nvidia.com> * fixed the failing shim --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <mithunr@nvidia.com> * Removed unnecessary base class. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <mithunr@nvidia.com> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <mithunr@nvidia.com> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <timl@nvidia.com>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Peixin Li <pxLi@nyu.edu>
…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <pxLi@nyu.edu> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <pxLi@nyu.edu> --------- Signed-off-by: Peixin Li <pxLi@nyu.edu>
* AnalysisException child class Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com>
This is a bug fix for the hive write tests. In some of the tests on Spak 351, the ProjectExec will fall back to CPU due to missing the GPU version of the MapFromArrays expression. This PR adds the ProjectExec to the allowed list of fallback for Spark 351 and the laters. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
…s] (NVIDIA#10994) * POM changes for Spark 4.0.0 Signed-off-by: Raza Jafri <rjafri@nvidia.com> * validate buildver and scala versions * more pom changes * fixed the scala-2.12 comment * more fixes for scala-2.13 pom * addressed comments * add in shim check to account for 400 * add 400 for premerge tests against jdk 17 * temporarily remove 400 from snapshotScala213 * fixed 2.13 pom * Remove 400 from jdk17 as it will compile with Scala 2.12 * github workflow changes * added quotes to pom-directory * update version defs to include scala 213 jdk 17 * Cross-compile all shims from JDK17 to JDK8 Eliminate Logging inheritance to prevent shimming of unshimmable API classes Signed-off-by: Gera Shegalov <gera@apache.org> * dummy * undo api pom change Signed-off-by: Gera Shegalov <gera@apache.org> * Add preview1 to the allowed shim versions Signed-off-by: Gera Shegalov <gera@apache.org> * Scala 2.13 to require JDK17 Signed-off-by: Gera Shegalov <gera@apache.org> * Removed unused import left over from razajafri#3 * Setup JAVA_HOME before caching * Only upgrade the Scala plugin for Scala 2.13 * Regenerate Scala 2.13 poms * Remove 330 from JDK17 builds for Scala 2.12 * Revert "Remove 330 from JDK17 builds for Scala 2.12" This reverts commit 1faabd4. * Downgrade scala.plugin.version for cloudera * Updated comment to include the issue * Upgrading the scala.maven.plugin version to 4.9.1 which is the same as Spark 4.0.0 * Downgrade scala-maven-plugin for Cloudera * revert mvn verify changes * Avoid cache for JDK 17 * removed cache dep from scala 213 * Added Scala 2.13 specific checks * Handle the change for UnaryPositive now extending RuntimeReplaceable * Removing 330 from jdk17.buildvers as we only support Scala2.13 and fixing the enviornment variable in version-defs.sh that we read for building against JDK17 with Scala 213 * Update Scala 2.13 poms * fixed scala2.13 verify to actually use the scala2.13/pom.xml * Added missing csv files * Skip Opcode tests There is a bytecode incompatibility which is why we are skipping these until we add support for it. For details please see the following two issues NVIDIA#11174 NVIDIA#10203 * upmerged and fixed the new compile error introduced * addressed review comments * Removed jdk17 cloudera check and moved it inside the 321,330 and 332 cloudera profiles * fixed upmerge conflicts * reverted renaming of id * Fixed HiveGenericUDFShim * addressed review comments * reverted the debugging code * generated Scala 2.13 poms --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com> Signed-off-by: Gera Shegalov <gera@apache.org> Co-authored-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
…11197) Signed-off-by: Chong Gao <res_life@163.com>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
To fix issue: NVIDIA#11114 To support Spark 3.3+ and 4.0+ shims, we change to build the Scala2.13 nightly dist jar with JDK17. Signed-off-by: Tim Liu <timl@nvidia.com>
…arquet IDs (NVIDIA#11202) Signed-off-by: Jason Lowe <jlowe@nvidia.com>
…huffleThreadedWriterBase (NVIDIA#11180) * Exclude the processing time in records.hasNext from the serialization time estimation Signed-off-by: Jihoon Son <ghoonson@gmail.com> * Exclude the wait time on limiter * Exclude batch size computing time as well * fix outdated comment; add more comments * Add a function that takes a TimeTrackingIterator * make stuff private --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com>
* from_json invalid data in rapids added Signed-off-by: fejiang <fejiang@nvidia.com> * adding logging message when parsing invalid json Signed-off-by: fejiang <fejiang@nvidia.com> * remove unwanted test Signed-off-by: fejiang <fejiang@nvidia.com> * setting changed that one more exception catch Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * style changed Signed-off-by: fejiang <fejiang@nvidia.com> * Change exception catch logic Signed-off-by: fejiang <fejiang@nvidia.com> * adding new exception class Signed-off-by: fejiang <fejiang@nvidia.com> * removed logging Signed-off-by: fejiang <fejiang@nvidia.com> * removed logging Signed-off-by: fejiang <fejiang@nvidia.com> * line recoverd Signed-off-by: fejiang <fejiang@nvidia.com> --------- Signed-off-by: fejiang <fejiang@nvidia.com>
…DIA#11219) * Fix hash-aggregate tests failing in ANSI mode Fixes NVIDIA#11018. This commit fixes the tests in `hash_aggregate_test.py` to run correctly when run with ANSI enabled. This is essential for running the tests with Spark 4.0, where ANSI mode is on by default. A vast majority of the tests here happen to exercise aggregations like `SUM`, `COUNT`, `AVG`, etc. which fall to CPU, on account of NVIDIA#5114. These tests have been marked with `@disable_ansi_mode`, so that they run to completion correctly. These may be revisited after NVIDIA#5114 has been addressed. In cases where NVIDIA#5114 does not apply, the tests have been modified to run with ANSI on and off. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
Support MapFromArrays on GPU --------- Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
…ted. [databricks] (NVIDIA#11129) Fixes NVIDIA#11031. This PR addresses tests that fail on Spark 4.0 in the following files: 1. `integration_tests/src/main/python/datasourcev2_read_test.py` 2. `integration_tests/src/main/python/expand_exec_test.py` 3. `integration_tests/src/main/python/get_json_test.py` 4. `integration_tests/src/main/python/hive_delimited_text_test.py` 5. `integration_tests/src/main/python/logic_test.py` 6. `integration_tests/src/main/python/repart_test.py` 7. `integration_tests/src/main/python/time_window_test.py` 8. `integration_tests/src/main/python/json_matrix_test.py` 9. `integration_tests/src/main/python/misc_expr_test.py` 10. `integration_tests/src/main/python/orc_write_test.py` Signed-off-by: MithunR <mithunr@nvidia.com>
…-11212 Fix auto merge conflict 11212
…s] (NVIDIA#11220) * Avoid hit spark bug SPARK-44242 while generate run_dir Signed-off-by: Peixin Li <pxLi@nyu.edu> * Update integration_tests/run_pyspark_from_build.sh apply suggestion Co-authored-by: Jason Lowe <jlowe@nvidia.com> --------- Signed-off-by: Peixin Li <pxLi@nyu.edu> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…IDIA#11230) Signed-off-by: Jason Lowe <jlowe@nvidia.com>
* add cache dependencies step for scala 213 Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * add populate script Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * move yml Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * fix error of script shell Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * hardcode buildvers Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Update .github/workflows/mvn-verify-check.yml to differentiate the cache key cleart Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * fix nit Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> --------- Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com> Co-authored-by: Peixin <pxli@nyu.edu> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Maybe addresses NVIDIA#11225 Signed-off-by: Gera Shegalov <gera@apache.org>
* Remove the unused var CUDF_VER from the CI script Signed-off-by: Tim Liu <timl@nvidia.com> * Update for the review comment Signed-off-by: Tim Liu <timl@nvidia.com> --------- Signed-off-by: Tim Liu <timl@nvidia.com>
* clear the regex logic Signed-off-by: fejiang <fejiang@nvidia.com> * local change of substring index Signed-off-by: fejiang <fejiang@nvidia.com> * stringFunctions scala Signed-off-by: fejiang <fejiang@nvidia.com> * stringFunctions import conflict resolved Signed-off-by: fejiang <fejiang@nvidia.com> * doColumnar calling Signed-off-by: fejiang <fejiang@nvidia.com> * delimiter change to scalar type Signed-off-by: fejiang <fejiang@nvidia.com> * delimiter changed to scalar type Signed-off-by: fejiang <fejiang@nvidia.com> * changed delimiter type Signed-off-by: fejiang <fejiang@nvidia.com> * comment removed Signed-off-by: fejiang <fejiang@nvidia.com> * unwanted test case Signed-off-by: fejiang <fejiang@nvidia.com> * IT test added Signed-off-by: fejiang <fejiang@nvidia.com> * remove RapidExpressionsSuite Signed-off-by: fejiang <fejiang@nvidia.com> * adding evaluating logic when using GpuScalar Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * formatting Signed-off-by: fejiang <fejiang@nvidia.com> * remove the single delim note in gpuoverride Signed-off-by: fejiang <fejiang@nvidia.com> * doc generated Signed-off-by: fejiang <fejiang@nvidia.com> --------- Signed-off-by: fejiang <fejiang@nvidia.com>
* Test Signed-off-by: Gera Shegalov <gera@apache.org> * reviews: fix typo Signed-off-by: Gera Shegalov <gera@apache.org> --------- Signed-off-by: Gera Shegalov <gera@apache.org>
Keep rapids JNI and private dependency version util the nightly CI for the branch branch-24.10 is done.. Track the dependency update by: https://gitlab-master.nvidia.com/timl/spark-rapids-private/-/issues/14 Signed-off-by: NVTIMLIU <70000568+nvauto@users.noreply.github.com>
Signed-off-by: NVTIMLIU <70000568+nvauto@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to 24.10.0
Note: merge this PR with Create a merge commit to merge