Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Profiler tool does not catch the RAPIDS jars correctly #734

Closed
amahussein opened this issue Jan 19, 2024 · 0 comments · Fixed by #736
Closed

[BUG] Profiler tool does not catch the RAPIDS jars correctly #734

amahussein opened this issue Jan 19, 2024 · 0 comments · Fixed by #736
Assignees
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@amahussein
Copy link
Collaborator

amahussein commented Jan 19, 2024

Describe the bug

Currently the CollectInformation.scala checks for the environment details map "Classpath Entries" to verify that RAPIDS jars are included in the classPath.

I found out that this is not always the case. For example, some GPU eventlogs do not have the RAPIDS jars as part of the "Classpath Entries". Instead:

{
   "Event":"SparkListenerEnvironmentUpdate",
   "JVM Information":{
      "Java Home":"/usr/lib/jvm/temurin-8-jdk-amd64/jre",
      "Java Version":"1.8.0_345 (Temurin)",
      "Scala Version":"version 2.12.14"
   },
  "Spark Properties":{
      .....
      "spark.yarn.dist.jars":"file:///tmp/232c28608dc64730b2c0f0cefaa32a05/rapids-4-spark_2.12-22.10.0.jar",
      "spark.yarn.secondary.jars":"rapids-4-spark_2.12-22.10.0.jar",
      "spark.repl.local.jars":"file:///tmp/232c28608dc64730b2c0f0cefaa32a05/rapids-4-spark_2.12-22.10.0.jar",
  },
   "System Properties":{
   "sun.java.command":"org.apache.spark.deploy.SparkSubmit --conf spark.executor.memory=16G  --jars /tmp/232c28608dc64730b2c0f0cefaa32a05/rapids-4-spark_2.12-22.10.0.jar /tmp/232c28608dc64730b2c0f0cefaa32a05/sample_sanity.py gs://spark-directory",
  }

Some other properties to hold the RAPIDS jars:

spark.driver.extraClassPath
spark.executor.extraClassPath

This means that profiler will pass wrong information to the AutoTuner.

  • AutoTuner will mistakenly considered the path as loaded.
  • AutTuner won't make the recommendation based on the jars version
@amahussein amahussein added bug Something isn't working ? - Needs Triage core_tools Scope the core module (scala) labels Jan 19, 2024
@amahussein amahussein self-assigned this Jan 22, 2024
amahussein added a commit to amahussein/spark-rapids-tools that referenced this issue Jan 22, 2024
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Fixes NVIDIA#713, Fixes NVIDIA#734

This PR changes the behavior of the AutTuner to display a comment when
the file-encoding of an application is set to a value that is not "utf-8".

The changes also improves the extraction of the RAPIDS jars values.

*Changes for 713*:

- Added a new field in `ApplicationSummaryInfo` that represents the
  systemProperties
- Capture SystemProperties in the App in order to be able to check the
  file-encoding
- Moved map properties to `CacheableProps` so that it can be used by the
  Qualification as well.
- Moved String-conversion methods from AutTuner to StringUtils
- Moved `getEventFromJsonMethod` to EventUtils object
- Added a new UnitTest for the AutoTuner
- Updated ApplicationInfoSuite unitTests

*Changes for 734*:

- Fixed the implementation of `CollectInformation.getRapidsJARInfo`
amahussein added a commit that referenced this issue Jan 23, 2024
…ons (#736)

* [FEA] AutoTuner warns that non-utf8 may not support some GPU expressions

Fixes #713, Fixes #734

This PR changes the behavior of the AutTuner to display a comment when
the file-encoding of an application is set to a value that is not "utf-8".

The changes also improves the extraction of the RAPIDS jars values.

*Changes for 713*:

- Added a new field in `ApplicationSummaryInfo` that represents the
  systemProperties
- Capture SystemProperties in the App in order to be able to check the
  file-encoding
- Moved map properties to `CacheableProps` so that it can be used by the
  Qualification as well.
- Moved String-conversion methods from AutTuner to StringUtils
- Moved `getEventFromJsonMethod` to EventUtils object
- Added a new UnitTest for the AutoTuner
- Updated ApplicationInfoSuite unitTests

*Changes for 734*:

- Fixed the implementation of `CollectInformation.getRapidsJARInfo`

---------

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
1 participant