-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Events lost during JFR dumping #165
Comments
Thanks for the careful report! |
The problem is more complex - the JFR proper does double-buffering for this reason and in order to achieve that it introduced 'tagging' constant pool artifacts (classes, symbols etc.). When the recording is finalized these constant pool artifacts need to be written out but one needs to make sure that: This is not a trivial task, hence the tagging and epochs in the JDK JFR engine (if you check the sources, the implementation is anything but simple, to handle all the edge cases). We are using the JFR writer coming from async-profiler (obviously) and that one is, for the sake of simplicity, not doing any double buffering so this leads to the observed data loss. So far, the reasoning was that the actual data loss is pretty minimal and it didn't really warrant a full rewrite of the trace storage and JFR writer to support overlapping data collection and recording finalization. |
Hi, During dumping, we'll convert jmethodIds to real class names and method names. If the number of methods recorded in the last minute is very large, we'll spent lots of time in JVMTI, even several seconds. This may lead to unacceptable number of events being lost. Every time a method is written to JFR, the java-profiler/ddprof-lib/src/main/cpp/flightRecorder.cpp Lines 1161 to 1168 in 4f4d70a
java-profiler/ddprof-lib/src/main/cpp/flightRecorder.cpp Lines 279 to 284 in 4f4d70a
I also noticed theses structures are never cleared, in extreme scenarios, they may consume a lot of memory. java-profiler/ddprof-lib/src/main/cpp/flightRecorder.h Lines 265 to 268 in 4f4d70a
Any thoughts ? |
The marking works as follows:
So, we are really not re-resolving all jmethodIDs during a single dump. Each jmethodID is looked up at most once per one dump. The result is cached in the Agreed, this could be more efficient, but spending seconds on serialization sounds a bit extreme. How large are your recordings? |
I know in a single dump, each jmethodId is only resolved once. It is possible to resolve each jmethodId only once across multiple dumps (i.e. as the same lifecycle as one recording)?
You are right, _packages and _symbols are discarded once we exit from writeCpool.
The dump of the JFR file bellow took more than 300 millis to call JVMTI (see flame graph bellow) In a real case of our internal application, to dump a 4MiB JFR (not compressed) file, It took more than 4 seconds, most of the data is jdk.CheckPoint. |
This would require a shared As I said, I agree things can be improved. But it is a rather complex topic because of the intricacies of the lifecycle of the collected information etc. (eg. you would need to defensively copy the strings returned for method names, signatures etc.) |
Hi,
If the CPU samples of a request crosses 2 JFR files, I noticed the events of the first 100 milliseconds are lost in the 2nd JFR file.
Other events like wall clock have the same problem.
The reason may be during dumping, the lock is hold by the dumping thread, prevent other threads to record sample.
The time of data loss is the same as the time of dumping.
java-profiler/ddprof-lib/src/main/cpp/profiler.cpp
Lines 1368 to 1376 in 6e6b454
java-profiler/ddprof-lib/src/main/cpp/profiler.cpp
Lines 643 to 646 in 6e6b454
The text was updated successfully, but these errors were encountered: