Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the main branch for 2412 release #480

Merged
merged 23 commits into from
Dec 24, 2024

Conversation

nvliyuan
Copy link
Collaborator

update the main branch for v2412 release.
Please create a merge commit, not squash.

nvauto and others added 23 commits September 24, 2024 07:36
Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
1, Add the variables 'SPARK_MASTER_URL' and 'DATA_ROOT' to support automated testing from CI/CD jobs.

2, 'output_prefix' is NOT referenced in below lines

Signed-off-by: timl <timl@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Add a TPC-DS SF 10 Notebook for locall Jupyter

or Google Colab

Signed-off-by: Gera Shegalov <gera@apache.org>

* Update link to the current blob

Signed-off-by: Gera Shegalov <gera@apache.org>

---------

Signed-off-by: Gera Shegalov <gera@apache.org>
* Update "Open in Colab" link

* Update README.md

Signed-off-by: Gera Shegalov <gera@apache.org>

---------

Signed-off-by: Gera Shegalov <gera@apache.org>
* Add Tools Notebooks for EMR

* Update README

* Sign-off commit

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

---------

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
* add notebook

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* change default kernel

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* add test

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* print

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* change spark master

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* change path of data path

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* remove old notebook

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* optimize default value

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* clear all output

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* remove id

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

---------

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
* change data parh

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* change data path

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* save result to file

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

* change format of output

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

---------

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: Peixin Li <pxLi@nyu.edu>
Signed-off-by: Peixin Li <pxLi@nyu.edu>
* Support running optuna on Spark

* play around with optuna + xgboost + joblibspark

* update

* update

* update

* Optuna: Add how optuna + xgboost + spark works

* deploy optuna examples on databricks

* Cleanup and prepare for open source

* Update README.md

* Update run-optuna-spark-xgboost.sh

* Update README.md with chmod for run scripts

* update/add copyright

* Replace RDD example with Dataframe example, cleanups to README and repo structure

* Move files to separate dir for merge to spark-rapids-examples

* separate dir

* Merge branch-24.10

* Remove gitignore

Signed-off-by: Rishi Chandra <rishic@nvidia.com>

* remove username

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix run-sparkrapids-xgboost.sh implementation path

* Fix corrupt mysql installation in init_optuna.sh

* Fix corrupt mysql installation in init_optuna_xgboost.sh

* Update run-joblibspark-xgboost.sh

* Update run-sparkrapids-xgboost.sh

* Update run-joblibspark-simple.sh

* Update run-joblibspark-xgboost.sh

* Update run-sparkrapids-xgboost.sh

* Update to 24.10, include apt updates

* use gpu_hist for older xgb versions

* use gpu_hist for older xgb versions

* Update sparkrapids-xgboost-read-per-worker.py

* Update init script with mysql installation fixes

* Address comments

* Add cluster startup script

* Move around files, add notebooks

* Repo renovations

* Update README, remove run scripts, fix PCA to use 24.10.1

* minor updates

* Final cleanups, runs passed on databricks

* README updates

* Updates to comments, cleanup outputs

* Address comments, minor reordering, update README

* comment fix

* remove unnecessary imports

* tuning max bins and n_estimators bug fixes

* 'max_bin' != 'max_bins' 🤦

* ensure QDM and XGB bins are the same

* typos

* cleanup

* Address comments

* Note about sampler serialization

* Add link

* Add link

* Undo benchmark commit

* typo

---------

Signed-off-by: Rishi Chandra <rishic@nvidia.com>
Co-authored-by: Bobby Wang (SW-TEGRA) <bobwang@nvidia.com>
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Erik Ordentlich <eordentlich@nvidia.com>
Signed-off-by: liyuan <yuali@nvidia.com>
@nvliyuan nvliyuan merged commit e863522 into NVIDIA:main Dec 24, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants