-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate a detailed report for the write ops #1544
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> Fixes NVIDIA#1536 This commits adds a new report containing the write operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes suggested. Just some clarifying questions.
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DeltaLakeHelper.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualSQLPlanAnalyzer.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModelManager.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModelManager.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanVersion.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanVersion.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/WriteOperationStore.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/WriteOperationStore.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sayedbilalbari
I will add samples of the Node content to the code.
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/WriteOperationStore.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DeltaLakeHelper.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualSQLPlanAnalyzer.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModelManager.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModelManager.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanVersion.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanVersion.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/WriteOperationStore.scala
Show resolved
Hide resolved
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sayedbilalbari for the offline discussion.
I addressed all your questions and comments.
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/WriteOperationStore.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanVersion.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualSQLPlanAnalyzer.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me
Fixes #1536
This commits adds a new report containing the write operations.
The generated report is mostly functional for
InsertIntoHadoopFsRelationCommand
and more incremental improvements need to follow.New report:
No text format is generated for this report. The reason is that it is not readable to have the path and the node description into Text and it will represent unecessary overhead
The generated report is called
write_operations.csv
and it looks like the following:sqlPlanVersion
the number of the plan version. This is helpful if we want to generate the write operations for all the plans when AQE is enabled.fromFinalPlan
: True when the plan is final.unknown
"fullDescription
contains the actual node-secr truncated at 500 characters.outputColumns
: lists the columns separated by semicolumn. In case of truncated schema, the column will include something like... and 24 more fields
.What is missing in this PR:
Detailed descriptions of the changes.
In addition to the new report, this pull request includes several changes to enhance the functionality and maintainability of the codebase, particularly in the
planparser
andprofiling
modules. The most important changes involve the addition of new metadata extraction methods, the introduction of a new utility class, and updates to existing classes to accommodate these changes.Enhancements in metadata extraction and utility usage:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
: Added a new methodextractWriteOpRecord
to extract metadata from write operation nodes and updated thegetWriteOpMetaFromNode
method to utilize this new method. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DeltaLakeHelper.scala
: Introduced a new methodgetWriteCMDWrapper
to retrieve the write command wrapper for Delta Lake write operations.Introduction of
StringUtils
utility class:core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DataWritingCommandExecParser.scala
: Replaced hardcoded "unknown" strings withStringUtils.UNKNOWN_EXTRACT
for consistency.core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/HiveParseHelper.scala
: Updated the methodgetHiveFormatFromSimpleStr
to useStringUtils.UNKNOWN_EXTRACT
.core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/ReadParser.scala
: ReplacedUNKNOWN_METAFIELD
withStringUtils.UNKNOWN_EXTRACT
.Enhancements in profiling:
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ApplicationSummaryInfo.scala
: Added a new fieldwriteOpsInfo
to store write operation profile results.core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Profiler.scala
: Updated theProfiler
class to include write operation information in the profiling output. [1] [2] [3]These changes collectively improve the code's robustness, maintainability, and functionality, particularly in handling write operations and profiling.