forked from NVIDIA/spark-rapids-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add safeguards to prevent older attempts from generating metrics outp…
…ut in Scala Tool (NVIDIA#1324) * Add safeguards to prevent older attempts from generating qual summary output * Adding synchronization on the reporting and autotuner level (#54) * Fix failing scala unit tests by providing unique app IDs in test (#53) * Ignore test for event logs with same app Id and attempt Id --------- Signed-off-by: Partho Sarthi <psarthi@nvidia.com> Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> Co-authored-by: Ahmed Hussein (amahussein) <a@ahussein.me>
- Loading branch information
1 parent
277e951
commit 4f2d6e0
Showing
23 changed files
with
321 additions
and
109 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
72 changes: 72 additions & 0 deletions
72
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/AppSubscriber.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package com.nvidia.spark.rapids.tool.qualification | ||
|
||
class AppSubscriber(val appId: String) { | ||
val lock = new Object() | ||
private var attemptID: Option[Int] = None | ||
|
||
def unsafeSetAttemptId(newAttempt: Int): Boolean = { | ||
attemptID match { | ||
case Some(a) => | ||
if (newAttempt > a) { | ||
attemptID = Some(newAttempt) | ||
} | ||
case None => attemptID = Some(newAttempt) | ||
} | ||
newAttempt == attemptID.get | ||
} | ||
|
||
def safeSetAttemptId(newAttempt: Int): Boolean = { | ||
lock.synchronized { | ||
unsafeSetAttemptId(newAttempt) | ||
} | ||
} | ||
} | ||
|
||
object AppSubscriber { | ||
private val APP_SUBSCRIBERS = new java.util.concurrent.ConcurrentHashMap[String, AppSubscriber]() | ||
|
||
def getOrCreate(appId: String): AppSubscriber = { | ||
APP_SUBSCRIBERS.computeIfAbsent(appId, _ => new AppSubscriber(appId)) | ||
} | ||
|
||
def subscribeAppAttempt(appId: String, newAttemptId: Int): Boolean = { | ||
val subscriber = getOrCreate(appId) | ||
subscriber.safeSetAttemptId(newAttemptId) | ||
} | ||
|
||
def withSafeValidAttempt[T](appId: String, currAttempt: Int)(f: () => T): Option[T] = { | ||
val subscriber = getOrCreate(appId) | ||
subscriber.lock.synchronized { | ||
if (subscriber.unsafeSetAttemptId(currAttempt)) { | ||
Option(f()) | ||
} else { | ||
None | ||
} | ||
} | ||
} | ||
|
||
def withUnsafeValidAttempt[T](appId: String, currAttempt: Int)(f: () => T): Option[T] = { | ||
val subscriber = getOrCreate(appId) | ||
if (subscriber.unsafeSetAttemptId(currAttempt)) { | ||
Option(f()) | ||
} else { | ||
None | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 7 additions & 7 deletions
14
.../src/test/resources/spark-events-qualification/cluster_information/eventlog_2nodes_8cores
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 7 additions & 7 deletions
14
...urces/spark-events-qualification/cluster_information/eventlog_3nodes_12cores_exec_removed
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 7 additions & 7 deletions
14
...ces/spark-events-qualification/cluster_information/eventlog_3nodes_12cores_variable_cores
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 7 additions & 7 deletions
14
core/src/test/resources/spark-events-qualification/cluster_information/platform/dataproc
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 7 additions & 7 deletions
14
core/src/test/resources/spark-events-qualification/cluster_information/platform/emr
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 7 additions & 7 deletions
14
core/src/test/resources/spark-events-qualification/cluster_information/platform/onprem
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+26.4 KB
core/src/test/resources/spark-events-qualification/eventlog_same_app_id_1.zstd
Binary file not shown.
Binary file added
BIN
+26.4 KB
core/src/test/resources/spark-events-qualification/eventlog_same_app_id_2.zstd
Binary file not shown.
Binary file added
BIN
+52.2 KB
core/src/test/resources/spark-events-qualification/multiple_attempts/attempt_1_eventlog.zstd
Binary file not shown.
Binary file added
BIN
+52.2 KB
core/src/test/resources/spark-events-qualification/multiple_attempts/attempt_2_eventlog.zstd
Binary file not shown.
Binary file added
BIN
+52.2 KB
core/src/test/resources/spark-events-qualification/multiple_attempts/attempt_3_eventlog.zstd
Binary file not shown.
Binary file added
BIN
+52.5 KB
core/src/test/resources/spark-events-qualification/multiple_attempts/attempt_4_eventlog.zstd
Binary file not shown.
Oops, something went wrong.