-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: include ancestors in process events #2938
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hello. Upd: found even more problems, converting back to draft for now. |
I'll look shortly sorry was travelling and then catching up. Should be able to get to this today or tomorrow thanks! |
e2e9ea1
to
558ee86
Compare
I think i misunderstood the purpose of both So now there seems to be no real reason to return The biggest obstacle now is that due to current implementation of process cleanup, as described in commit 45745a0, it becomes impossible to reconstruct full process ancestry in some cases. Assume the following scenario:
If n > 3, then all processes with id < n-1 would have their refcnt set to 0 and as a result removed from the event cache, breaking the ancestry chain. I don't think i can resolve that without introducing any potentially breaking changes, as this PR already has quite a lot of code to review. And there is still inconsistency in the ancestors field's value across protobuf messages in api/v1/tetragon/tetragon.proto and vendor/github.com/cilium/tetragon/api/v1/tetragon/tetragon.proto. I'm still not sure how to properly handle that. @jrfastab, may I ask you for a code review now? Sorry for the delay. |
558ee86
to
9332a4d
Compare
9332a4d
to
bd44a87
Compare
Hello @olsajiri @tpapagian . I made some changes based on your suggestions:
I also renamed some variables for the sake of consistency.
|
Thanks for the update, I will review that possibly on Monday. |
bd44a87
to
815a7d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to do the updates. I believe that now the code seems simpler.
One thing that I would like to see is to have some tests that check the reference counts for the ancestors. For example in https://github.com/cilium/tetragon/blob/main/pkg/grpc/exec/exec_test_helper.go we define some tests that are called in https://github.com/cilium/tetragon/blob/main/pkg/grpc/exec/exec_test.go. In those cases, we check the reference counts (i.e. here).
As breaking reference counting can cause several issues (i.e. process information to be evicted from the cache earlier than needed and thus events lack important information, or in the opposite case where process information never gets evicted from the cache and this results in excessive memory usage) it would be great to add some tests for catching similar issues. As an example, you can simulate the case that we described in #2938 (comment) and after each event check that all reference counts are what we expect to see.
GetAncestorProcesses/GetAncestorProcessesInternal and NeededAncestors/NeededAncestorsInternal
I believe that in those case there is some code repetition that could be avoided. For example for GetAncestorProcessesInternal
and GetAncestorProcesses
you can keep only GetAncestorProcessesInternal
as from a ProcessInternal
struct we can get the process
. Then, instead of GetAncestorProcesses
you can call GetAncestorProcessesInternal
and when you iterate the result just keep only what you need. Based on that we can possibly think of a similar solution for NeededAncestors
/NeededAncestorsInternal
but I haven't though of all the details.
6e2613f
to
ab1d044
Compare
ab1d044
to
c2904fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've been pulled-in to review for the docs change, they look good to me. I'll let people that followed on this PR to review the actual patches :) Thanks!
Thanks for taking the time to add tests on the reference counting and sorry for the delay on reviewing that again. I have run several manual stress tests to check also that reference counting is correct and everything seems fine. One possibly last comment is that now you add ancestors to all event types. This can easily result in very large events and in a heavy loaded server this would also mean that we would need a large amount of storage to keep those. I believe that adding ancestors only in If you believe that we need ancestors in all events, then I would propose to have a separate command line flag to enable those selectively. i.e. |
I would prefer having separate command line flags then. I mainly use tetragon with SIEM/XDR like systems, and i would like to have an option to have ancestor processes included in all mentioned events, so i can use them in correlation rules logic inside these systems. Additionally, i would like to also have ancestor processes for at least some types of these events to be able to filter out some of them with export-denylist filters. I agree that we can populate ancestors in all other events from corresponding If this is fine, i think i can adjust my code to use separate options, but i'll need some time to implement and test it properly. One more question, though: can i at least combine |
Yes, I am fine with that as long as there is a use case in your side that can be beneficial.
Yes, I think this is fine as well. So
Correct, and this could work in other cases as well (i.e. kprobes). But I want to avoid the overhead of collecting ancestors and then dropping them. In the exit event as you said you have to collect them and do the refDec. So there is not unnecessary overhead there. |
c2904fc
to
815b3c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort that you put on that. I don't have any more significant comments. Please have a look at some final minor comments that I added.
Could you also rebase that PR in order to run the CI as there are conflicts. Once you rebase and CI is green I will approve that.
Thanks again!
815b3c8
to
91dfc98
Compare
Add support for collection of ancestors of the process beyond the immediate parent in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add support for collection of ancestors of the process beyond the immediate parent in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add configuration options to include ancestors of the process beyond the immediate parent in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events. All new options are turned off by default. Additionally, options `enable-process-kprobe-ancestors`, `enable-process-tracepoint-ancestors`, `enable-process-uprobe-ancestors`, `enable-process-lsm-ancestors` are used only if base `enable-process-ancestors` option is set to `true`. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add support for collection of ancestors of the process beyond the immediate parent (up to PID 1 / PID 2) in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events via new configuration options. If the first attempt to collect all process' ancestors fails, all successfully collected ancestors are added to the event anyway, but the event is added to the eventcache for reprocessing. In order to guarantee that we are able to collect all ancestor processes, if required options are enabled, 'exec' and 'clone' events now increase reference counters of all ancestors processes of the given process, 'exit' and 'cleanup' events - decrease reference counters of all ancestors processes of the given process. This is true if, and only if, we are able to successfully collect all ancestors of the given process. If we encounter an error during ancestors collection, reference counters of those ancestors will not be changed. Signed-off-by: t0x01 <T0x01@protonmail.ch>
…abled Add TestGrpcExecAncestorsInOrder and TestGrpcExecAncestorsOutOfOrder tests to pkg/grpc/exec to ensure that process' reference counting is handled correctly when ancestors are enabled. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add information about new configuration flags, event fields and metrics, related to ancestors, to documentation. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add new `enable-process-ancestors`, `enable-process-kprobe-ancestors`, `enable-process-tracepoint-ancestors`, `enable-process-uprobe-ancestors` and `enable-process-lsm-ancestors` options to configuration example. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Implement a new export filter that can filter over ancestor binary names using RE2 regular expressions. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Implement a new export filter that can filter over ancestor binary names using RE2 regular expressions. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Implement a new export filter that can filter over ancestor binary names using RE2 regular expressions. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add TestAncestorBinaryRegexFilter test to pkg/filters to ensure that new ancestor binary export filter works as expected. Signed-off-by: t0x01 <T0x01@protonmail.ch>
Add information about new ancestor binary export filter to documentation. Signed-off-by: t0x01 <T0x01@protonmail.ch>
91dfc98
to
9de4e4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks for the resilience @t0x01: "started on on Sep 19, 2024". And thanks @tpapagian and @olsajiri for the support. :) |
Fixes 2420
Add support for collection of ancestors of the process beyond the immediate parent (up to PID 1 / PID 2) in process_exec, process_exit, process_uprobe, process_kprobe, process_lsm, process_tracepoint events via new configuration options and implement new ancestor binary export filter.
See commits.