Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements #1539

Merged
merged 5 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions core/src/main/resources/bootstrap/tuningTable.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,16 @@ tuningDefinitions:
enabled: true
level: job
category: tuning
- label: spark.dataproc.enhanced.execution.enabled
description: 'Enables enhanced execution. It is recommended to turn it on for better performance on Dataproc.'
enabled: true
level: job
category: tuning
- label: spark.dataproc.enhanced.optimizer.enabled
description: 'Enables enhanced optimizer. It is recommended to turn it on for better performance on Dataproc.'
enabled: true
level: job
category: tuning
- label: spark.executor.cores
description: 'The number of cores to use on each executor. It is recommended to be set to 16'
enabled: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,11 @@ class DataprocPlatform(gpuDevice: Option[GpuDevice],
clusterProperties: Option[ClusterProperties]) extends Platform(gpuDevice, clusterProperties) {
override val platformName: String = PlatformNames.DATAPROC
override val defaultGpuDevice: GpuDevice = T4Gpu
override val recommendationsToInclude: Seq[(String, String)] = Seq(
"spark.dataproc.enhanced.optimizer.enabled" -> "true",
"spark.dataproc.enhanced.execution.enabled" -> "true"
)

override def isPlatformCSP: Boolean = true
override def maxGpusSupported: Int = 4

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1117,8 +1117,9 @@ class AutoTuner(
calculateClusterLevelRecommendations()

// add all platform specific recommendations
platform.recommendationsToInclude.foreach {
case (property, value) => appendRecommendation(property, value)
platform.recommendationsToInclude.collect {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change prioritizes any user specified platform config over the ones coming from AutoTuner. We can update the comment to mention it if we plan to keeping this priority.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sayedbilalbari. Updated the PR description with the comment.

case (property, value) if getPropertyValue(property).isEmpty =>
appendRecommendation(property, value)
}
}
recommendFromDriverLogs()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,12 @@ abstract class BaseAutoTunerSuite extends FunSuite with BeforeAndAfterEach with

// Spark runtime version used for testing
def testSparkVersion: String = ToolUtils.sparkRuntimeVersion
// Databricks version used for testing
def testDatabricksVersion: String = "12.2.x-aarch64-scala2.12"
// RapidsShuffleManager version used for testing
def testSmVersion: String = testSparkVersion.filterNot(_ == '.')
// RapidsShuffleManager version used for testing Databricks
def testSmVersionDatabricks: String = "332db"

val defaultDataprocProps: mutable.Map[String, String] = {
mutable.LinkedHashMap[String, String](
Expand Down
Loading