Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Profiling tool auto-tuner should keep reducing the maxPartitionBytes if a table scan stage has heavy spilling and have OOM tasks #1565

Open
viadea opened this issue Feb 27, 2025 · 0 comments
Assignees
Labels
autotuner core_tools Scope the core module (scala) feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Feb 27, 2025

I wish Profiling tool auto-tuner should keep reducing the maxPartitionBytes if a table scan stage has heavy spilling and have OOM tasks.
This is proved to be working for at least one customer job.
My proposal is:

  • It is a table scan stage
  • It is spilling a lot(say some threashold)
  • There are tasks in this stage failing with com.nvidia.spark.rapids.jni.CpuSplitAndRetryOOM: CPU OutOfMemory (Or whatever types of OOM such as GPU OOM, heap OOM, etc)
    Then each time auto-tuner keep reducing maxPartitionBytes into half.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autotuner core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants