-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin tracing #844
Begin tracing #844
Conversation
To instrument wrap the code to be traced with a try-with-resources block To run it, use the `logging.properties` file. The output will be in an /opt/ folder, however, it can be changed. Once the intermediate trace is written, calling `jsonify.py output-trace.json cleaned-trace.json` will generate a trace file capable of being read in Chromium chrome://tracing as well as Eclipse Trace Compass with the Trace Event plugin installed from the incubator. The logger was taken from eclipse trace compass and is EPLv2. It is originally written by Genevieve Bastien and I. Signed-off-by: Matthew Khouzam <matthew.khouzam@ericsson.com>
fyi: @planger |
return new ResponseEntity<>(json, HttpStatus.BAD_REQUEST); | ||
} | ||
) { | ||
try (ScopeLog sl = new ScopeLog(LOGGER, Level.FINER, "/api/-/search")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a special class for logging might not be a good idea. For OpenVSX to survive it will be necessary to introduce some Application Performance Monitoring in the future, with its own monitoring. Instead to adjust the code for new logging sinks, I would propose to instrument the code with some standard logging, such as OpenTelemetry. Take a look at their server example. The code looks quite similar, but it is a standard API which could be listened by many log/trace implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair assessment. I see monitoring and postmortem tracing as two separate issues. What I would like in the future is to add OTEL to the scopelogs so that the loggers can feed monitoring.
I was tempted to use Otel, my concern was that
1- the community would not accept such a big change
2- since the spans are not auto-closeable, you need to remember to span.end() them.
If these concessions are acceptable, we can do it.
@@ -0,0 +1,65 @@ | |||
############################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a big surprise for me that the application doesn't have any logging configured yet. I think to add some logging is definitely a good idea, but I am not sure having a SnapshotHandler
as always on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot handler is only on if we use this logging.properties.
There are techniques to programatically activate/deactivate it though. I did not add one like an SSH command. The catch is when you're observing a slowdown, you already need monitoring to be on.
Also I should update the copyright header. (happy 2024!)
&& (record.getLevel().intValue() <= Level.FINE.intValue()) | ||
&& (pattern.matcher(record.getMessage()).matches())); | ||
if (logIt) { // add feature switch here | ||
addToSnapshot(record.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it is not a good idea at all to call any "command" methods in such "query"-by-design method as isLoggable()
. The method should be read-only by all the specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the solution!
public boolean isLoggable(LogRecord record) {
return
(fIsEnabled
&& super.isLoggable(record)
&& (record.getLevel().intValue() <= Level.FINE.intValue())
&& (pattern.matcher(record.getMessage()).matches()));
}
then override publish to put addToSnapshot there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed?
} | ||
|
||
private boolean addToSnapshot(String message) { | ||
InnerEvent event = InnerEvent.create(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure serializing parameters to a message and then serializing them is good idea, both from the performance and the design point of view. What do you think about LogRecord.parameters, will it be able to do the job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will update and check. The caveat: there's no "supplier" method with paramters, which means the string is always made AFAIK. That is why I made the string with the supplier. In other applications it made more sense, in VSX, not so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at both parameters and suppliers. For today, suppliers are quite simpler. But I am not against doing parameters if it's a strong opinion on your side. If you want to do parameters later, I am not against at all too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it. Now the parameters are lazy-serialized on getMessage, in the thread to drain. I think this is as good as it gets. ;)
*/ | ||
public static class ScopeLog implements AutoCloseable { | ||
|
||
private final long fTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like f<name>
pattern is not used anywhere else in the code, and even the PR contains another style to name the class members. I would propose to choose a single style and then stick to it.
* | ||
* @author Geneviève Bastien | ||
*/ | ||
public static class FlowScopeLogBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any usages of the class in the other parts of the code. It is really necessary there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future use, we use it in trace compass to have spans with links.
b87853c
to
7004c66
Compare
Minor update. |
This is a way to take a snapshot of any request that takes over n time. The SnapshotHandler will consume every event and keep it in memory, (up to maxEvents). This will then dump the full trace to disk when the specified threshold is met. To use the snapshotter, pass the following arguments to OpenVSX when starting it up: `-Djava.util.logging.config.file=%gitroot%/logging.properties (where %gitroot% is the directory tracecompass is checked out to)` The logging.properties should be updated to meet the current configuration of the Open-VSX server. Performance impact with ONE user on ONE computer was imperceptible, however, this needs to be verified prior to deployment in production. Traces will be placed in the folder specified by `logging.properties`. The traces are following the trace event standard and can be opened with chromium or Eclipse Trace Compass. The logger is based on the Trace Compass logger, which is also EPLv2. Signed-off-by: Matthew Khouzam <matthew.khouzam@ericsson.com>
7004c66
to
bd11ff3
Compare
This is a decent sized update, it fixes most things except for the code style. This I would leave up to the actual maintainers of the code. I see this patch may improve making data accessible in a GDPR friendly anonymized way. Right now we only record API calls, not who is calling or what is returned. This is anonymous and can be "public". |
Thanks for the contribution @MatthewKhouzam. We use Spring Boot Actuator to manage and monitor the server. The data can be collected by Prometheus and displayed in Grafana, as outlined in this proof of concept: #667 What data are you looking to collect? |
The data would be internal. At the moment, you can see that the api is logged, but also some inner for loops. They would be corelated so you can see which query is slow and more importantly why it is slow. To be clear, Spring Boot Actuator is good at monitoring, this is for investigating. |
There already is LongRunningRequestFilter for investigating. |
Either way, I would update this patch to use https://github.com/eclipse-tracecompass/trace-event-logger. Regarding LongRunningRequestFilter, as far as I can tell, it logs the request, not what happens inside the request. This would correlate a request to a flame chart, and flame graph. https://www.youtube.com/watch?v=ZlAZy7GjXRE&pp=ygUWb2JzZXJ2YWJpbGl0eSB0aGVpYWNvbg%3D%3D Here is an example of what we can show with this patch flame-charts.mp4 |
The reasoning of this patch is to not show when the code is slow. Which is handled well with Spring boot actuator, but rather what is making it slow. For that you either need deeper instrumentation. This is why chrome://tracing exists alongside, Otel. Both tools have their purpose. This patch allows chome tracing, perfetto or tracecompass to analyze the issue. |
Interesting, this from your own or Ericsson's Open VSX instance?
Right, like https://www.baeldung.com/spring-boot-opentelemetry-setup or https://grafana.com/blog/2022/05/04/how-to-capture-spring-boot-metrics-with-the-opentelemetry-java-instrumentation-agent/. |
It is from an instance of Trace Compass traced, as well as an example from an OS level trace. VSX was not shown as the data is not super interesting with the gatling traces. It was just a blob. :(
This is the problem This is even less interesting as we are dealing with KPIs, which are inferred and aggregated. A lot of information is lost.
That is a fair point, however, the tracer will be abstracted out soon. I have a project in eclipse for that. :) (Please thank IP team for their FAST response) For better observability within VSX, you need to add spans using open telemetry, or something similar in this library. Here is an example of what the end goal would look like. All manual work is there to add precision beyond what an auto-instrumenting program can supply. Think of it like JVMTI profiler, if it instruments everything, your program will collapse under the weight of the probes. OTel just does the endpoints. We are going deeper, but you need to define how deep. Here are some references that can help also, they are a bit deep though: |
Spring supports that: "We want to have some detailed observation of the MyUserService#userName method. Thanks to having added AOP support, we can use the |
Yes, on a per function level, not on a per-scope level. A way to side step that would be to keep all functions simple though. :) |
@MatthewKhouzam, I still think that for an online service the tracing should be online, also, delivering the actual information at any moment of time. TraceCompass is an amazing tool, but I would prefer to have something OpenTelemetry-based, such as Jaeger. |
I am OK with anything, as long as there's something. I was proposing the chromium/trace compass solution since it is something we used internally, and there are many features not available yet in jaeger. (Flamegraph, lookup stats, kernel correlation) but really... we just need something. :) Regarding tracing seeing info at any moment, that's more monitoring, super important too. Post mortems and deep dives are where this patch was focusing. |
I'm in the process of setting up monitoring with Grafana Cloud. Once monitoring works I want to add tracing/observability using OpenTelemetry. Grafana offers flame graphs, but I don't know if it "just works" or needs extra configuration. |
You can also do it manually: @Service
public class MyObservableService {
private final ObservationRegistry observationRegistry;
public MyObservableService(ObservationRegistry observationRegistry) {
this.observationRegistry = observationRegistry;
}
public void tracedMethod() {
Observation.createNotStarted("tracedMethod", this.observationRegistry)
.lowCardinalityKeyValue("locale", "en-US")
.highCardinalityKeyValue("userId", "77")
.observe(() -> {
// Write business logic here, all this code will now be traced/observed by this newly created "Observation"
// using the observationRegistry created earlier.
});
}
} source: https://www.unlogged.io/post/enhanced-observability-with-java-21-and-spring-3-2 |
I think #857 will do part of what this patch does. If you want any feature of https://github.com/eclipse-tracecompass/trace-event-logger, you know where to find it. |
Introduce a tracer to Open-VSX.
This tracer so far instruments the
Query
,QueryV2
andSearch
endpoints. It is used to see what kind of load is being placed on open vsx.The output is in the format (https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview) and can be viewed using Chromium (https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/) or Eclipse Trace Compass (https://www.youtube.com/watch?v=YCdzmcpOrK4).
This patch aims to improve the observability of Open-VSX.