Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements #1539

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Feb 7, 2025

Fixes #1538.

This PR updates AutoTuner/Bootstrapper to recommend the following Dataproc Spark performance enhancements

spark.dataproc.enhanced.optimizer.enabled=true
spark.dataproc.enhanced.execution.enabled=true

Note:

  • For these configurations, any values explicitly set by the user will take precedence over AutoTuner's recommendations.

Reference - https://cloud.google.com/dataproc/docs/guides/performance-enhancements

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added feature request New feature or request core_tools Scope the core module (scala) labels Feb 7, 2025
@parthosa parthosa self-assigned this Feb 7, 2025
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa marked this pull request as ready for review February 8, 2025 02:25
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !
nit: the test does not verify the behavior of the output in the following cases:

  • The properties were already set by the user either true/false: just confirming that we want to override the value set by the user even when if they set it to false?
  • The combined and the bootstrap output. It is fine to skip this since the implementation of those 2 parts is not completed anyway.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa
Copy link
Collaborator Author

The properties were already set by the user either true/false: just confirming that we want to override the value set by the user even when if they set it to false?

Thanks for catching that. Added unit test for this.

The combined and the bootstrap output. It is fine to skip this since the implementation of those 2 parts is not completed anyway.

Yes, we might need a framework to test these two files. Then, we could test all the recommendations. I think we should do that separately.

@parthosa parthosa requested a review from amahussein February 10, 2025 20:22
Copy link
Collaborator

@sayedbilalbari sayedbilalbari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment. Rest everythin looks good !

@@ -1117,8 +1117,9 @@ class AutoTuner(
calculateClusterLevelRecommendations()

// add all platform specific recommendations
platform.recommendationsToInclude.foreach {
case (property, value) => appendRecommendation(property, value)
platform.recommendationsToInclude.collect {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change prioritizes any user specified platform config over the ones coming from AutoTuner. We can update the comment to mention it if we plan to keeping this priority.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sayedbilalbari. Updated the PR description with the comment.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !
LGTME!

@parthosa parthosa merged commit 9f854ea into NVIDIA:dev Feb 11, 2025
13 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1538 branch February 11, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements configs
3 participants