Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse nested fields in body using pipelines - JSON parse doesn't work for attributes #6975

Open
fabn opened this issue Jan 29, 2025 · 3 comments

Comments

@fabn
Copy link

fabn commented Jan 29, 2025

Bug description

We're struggling at extracting JSON nested fields from our log files using Pipelines, here a sample JSON snippet our app is producing:

{
  "timestamp": "2025-01-29T16:40:19.711907Z",
  "level": "info",
  "instance_type": "t3.large",
  "duration_ms": 1.0162990000098944,
  "duration": "1.016ms",
  "named_tags": {
    "request_id": "5591baa6-d0e8-4e05-9985-dd2dfd13f072",
    "client_ip": "172.31.29.139",
    "user_agent": "curl/8.7.1"
  },
  "name": "TestController",
  "message": "Completed #index",
  "payload": {
    "controller": "TestController",
    "action": "index",
    "format": "*/*",
    "method": "GET",
    "path": "/",
    "status": 200,
    "view_runtime": 0.29,
    "db_runtime": 0.0,
    "allocations": 404,
    "status_message": "OK"
  }
}

We're interested in having all of these values as fields but we're not able to do that.

Expected behavior

JSON Parse of body creates this:

{
  "body": "{\"timestamp\":\"2025-01-29T16:35:48.690684Z\",\"level\":\"info\",\"instance_type\":\"t3.large\",\"duration_ms\":1.1547209993004799,\"duration\":\"1.155ms\",\"named_tags\":{\"request_id\":\"11cefd3e-d86d-431b-868a-6ce7bbb135f4\",\"client_ip\":\"172.31.29.139\",\"user_agent\":\"curl/8.7.1\"},\"name\":\"HomeController\",\"message\":\"Completed #index\",\"payload\":{\"controller\":\"HomeController\",\"action\":\"index\",\"format\":\"*/*\",\"method\":\"GET\",\"path\":\"/\",\"status\":200,\"view_runtime\":0.35,\"db_runtime\":0.0,\"allocations\":405,\"status_message\":\"OK\"}}",
  "id": "2sJL5aPf8k1XvKw5iCCaTmbcjk1",
  "timestamp": "2025-01-29T16:35:48.692793518Z",
  "attributes": {
    "duration_ms": 1.1547209993004799,
    "duration": "1.155ms",
    "instance_type": "t3.large",
    "level": "info",
    "log.iostream": "stdout",
    "logtag": "F",
    "message": "Completed #index",
    "name": "HomeController",
    "named_tags": "{\"client_ip\":\"172.31.29.139\",\"request_id\":\"11cefd3e-d86d-431b-868a-6ce7bbb135f4\",\"user_agent\":\"curl/8.7.1\"}",
    "payload": "{\"action\":\"index\",\"allocations\":405,\"controller\":\"HomeController\",\"db_runtime\":0,\"format\":\"*/*\",\"method\":\"GET\",\"path\":\"/\",\"status\":200,\"status_message\":\"OK\",\"view_runtime\":0.35}",
    "time": "2025-01-29T16:35:48.692793518Z",
    "timestamp": "2025-01-29T16:35:48.690684Z"
  },
}

As you can see named_tags and payload are parsed as strings.

We'd expect a structure with nested fields or with dot syntax, something like named_tags.client_ip in this way we can build queries and dashboards using those attributes.

We tried to play with pipelines following this guide with no success. We can only extract individual fields, it seems that JSON parse is not able to extract stuff from attributes.

How to reproduce

Given the sample log line configure this pipeline

  1. Parse JSON: body => attributes (create the named_tags attribute with value "{\"client_ip\":\"172.31.29.139\",\"request_id\":\"11cefd3e-d86d-431b-868a-6ce7bbb135f4\",\"user_agent\":\"curl/8.7.1\"}"
  2. Configure a second step to parse JSON from attributes.named_tags => attributes should produce client_ip, request_id and user_agent attributes.

The only thing we were able to do is to perform the steps in the linked guide to extract individual fields, but the content of our payload is very dynamic, it would be a pain to list all the attributes we have.

Also is not clear how multiple pipelines interact, if I have more than one pipeline matching the same log entry would they be all applied?

Because I also tried to add a final pipeline that select logs with named_tags EXISTS with a single processor that says parse_json attributes.named_tags => attributes, if I simulate that pipeline with the eye icon it seems to do what I expect, but then when I activate it it doesn't work.

Image

We're using signoz cloud if that matters.

Copy link

welcome bot commented Jan 29, 2025

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

@fabn
Copy link
Author

fabn commented Jan 30, 2025

I can confirm that adding multiple pipelines works, I added this configuration to my pipelines (for the JSON reported in first issue):

Image

  1. Rule 1 works
  2. Rule 2 works
  3. Rule 3 works and create a top level field attributes.named_tags with JSON value
  4. Rule 4 is a noop, it's not doing anything.
  5. Rule 5 works but for a single field and I don't want to replicate it for all my nested fields
  6. Rule 6 is just a cleanup

Now the funny part, I added another pipeline after the previous one to experiment, its matching criteria is named_tags EXISTS and it selects logs parsed by previous pipeline. It does the following:

Image

  1. Rule 1 should do what I'm trying to achieve and if I preview the rule it seems to work
  2. Rule 2 should do the same for payload we can ignore it for the moment
  3. Rule 3 adds a marker to confirm that log entry was processed by this pipeline
  4. Rule 4 perform cleanup after parsing

The funny thing is that if I simulate the processing of this pipeline it works (check the user_agent field in this screenshot from preview, it includes both named_tags and payload top level attributes

Image

But when I save the rule I don't find those attributes in the real logs, I however find pipelined and named_tags removed.

The call to /api/v1/logs/pipelines/preview shows the following body (part of it)

Image

and it returns the parsed log

Image

So there's clearly something wrong with the preview endpoint since live logs shows a different result.

@fabn
Copy link
Author

fabn commented Jan 30, 2025

Ok preview thing seems to be a known issue #5993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant