Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AWS cloudfront log parsing #10216

Merged
merged 21 commits into from
Sep 10, 2024

Conversation

ltflb-bgdi
Copy link
Contributor

@ltflb-bgdi ltflb-bgdi commented Jun 21, 2024

  • Bug

Proposed commit message

Fixed AWS cloudfront log parsing

Please explain:

  • WHAT:

    • Completely refactored parsing by replacing grok with csv to better comply to the cloudfront standard log format.
    • Fixed duration name in GROK csv parser, which was parsed into a _tmp field but never used.as not available.
    • Fixed x-forwarded-for handling:
      Make sure only real ip addresses are used. All malformed ip-addresses are listed in error.message without raising a pipeline error.
    • Moved cloudfront.content_type to http.response.mime_type, which is an ecs field.
    • Added event.kind: pipeline_error to error handler.
  • WHY:

    • The fixes above are needed to reliably process cloudfront logs. We have observed a lot of errors and grok timeouts, which lead to congestion and performance issues on our elastic clusters. With the new parser in place, parsing works without issues. Even though the pipeline contains much more processors than the one using grok, we do not see any performance impact.
    • The current, csv based parser lays the foundation of adding support of cloudfront realtime logs in the future, since realtime log fields are configurable and thus using grok is way too complex.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

How to test this PR locally

The PR contains sample logs of the problematic events.
To be tested with elastic-package test pipeline --data-streams cloudfront_logs

Note: Please note that the log producing the error.message is not added to the sample logs since it would let the
tests fail. See: elastic/elastic-package#1392

Related issues

@ltflb-bgdi ltflb-bgdi requested review from a team as code owners June 21, 2024 16:57
Copy link

cla-checker-service bot commented Jun 21, 2024

💚 CLA has been signed

@ltflb-bgdi ltflb-bgdi marked this pull request as draft June 21, 2024 16:58
@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch 3 times, most recently from 91f9492 to c31f466 Compare June 21, 2024 17:08
@ltflb-bgdi ltflb-bgdi marked this pull request as ready for review June 21, 2024 17:10
@ltflb-bgdi
Copy link
Contributor Author

/test

@ltflb-bgdi ltflb-bgdi changed the title Fix pb 561 cloudfront logs Fix AWS cloudfront log parsing Jun 21, 2024
@ltflb-bgdi ltflb-bgdi marked this pull request as draft July 19, 2024 14:18
@andrewkroh andrewkroh added Integration:aws AWS Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Jul 19, 2024
@ltflb-bgdi ltflb-bgdi marked this pull request as ready for review July 26, 2024 14:21
@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from 11fb5fc to 04579e2 Compare August 5, 2024 12:46
@ltflb-bgdi
Copy link
Contributor Author

/test

@@ -1,4 +1,9 @@
# newer versions go on top
- version: "2.23.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- version: "2.23.0"
- version: "2.22.1"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1,7 +1,7 @@
format_version: 3.0.0
name: aws
title: AWS
version: 2.22.0
version: 2.23.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version: 2.23.0
version: 2.22.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@ltflb-bgdi ltflb-bgdi Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to 2.22.2 due to merge conflict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. It was intentionally removed in #10223.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- drop:
if: "ctx.event.original.startsWith('#')"
- drop:
if: ctx?.event?.original?.startsWith('#')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if: ctx?.event?.original?.startsWith('#')
if: ctx.event?.original?.startsWith('#')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - -
2019-12-04 21:02:31 LAX1 392 2a02:cf40:add:4002:91f2:a9b2:e09a:6fc6 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - -
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - -
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the change in this PR make us brittle to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing tabs should not have an effect on the csv parsers. Still, according to the AWS specs we should not see this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't break the test, can we leave it in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. But our vs-code setup is configured to automatically remove trailing spaces after editing.
I have done a test with trailing tabs by adding them with another editor and the parser did not fail.
Given that fact and that trailing tabs should not appear in real life, I vote for letting the trailing tab removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But our vs-code setup is configured to automatically remove trailing spaces after editing.

I don't think you should do that here; there may be tests in other packages that depend on trailing whitespace (note that we do not have an .editorconfig file in this repo so there is no guidance on the expectation of format).

While real data should not have the trailing tab, recent events have again demonstrated that testing with artifacts that match expectations can lead to bad outcomes. I would prefer that we show that we are robust to this case.

Comment on lines 4 to 6
- elastic_package_remove_error_message # Used to solve elastic-package test error on adding error message to event.
dynamic_fields:
'event.ingested': "^.*$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic_fields configuration is not needed since we are not setting it in the pipeline.

I don't like the testing-hack for error.message. Rather than doing this I'd suggest making a host:port parser to handle the cases, and map localhost to its IP (IPv4 should be fine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the testing-hack either! And to be honest, I really don't get it why there is no fix from elastic yet to allow the error.message field in the test, since is is just a common ecs field, which even can be part of the original message. This is a known problem with an existing issue waiting to be fixed for almost a year.

lang: painless
source: >-
ctx._tmp.timestamp = ctx._tmp.date + 'T' + ctx._tmp.time;
if: ctx?._tmp?.date != null && ctx?._tmp?.time != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if: ctx?._tmp?.date != null && ctx?._tmp?.time != null
if: ctx._tmp?.date != null && ctx._tmp?.time != null

(similar change throughout: ctx is always non-null)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ignore_failure: true
- append:
field: related.ip
value: "{{ source.ip }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value: "{{ source.ip }}"
value: "{{{ source.ip }}}"

(throughout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

description: In rare cases, the field contains hex encoded whitspaces (\20). We replace them by whitespaces again to not break parsing.
field: _tmp.x_forwarded_for
target_field: _tmp.forwarded_ip
pattern: "\\\\x20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "\\\\x20"
pattern: '\\x20'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that AWS has analyzed this issue and considered it as a bug, with a fix expected to be rolled out within 4 weeks from now. We could think about removing this workaround already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed gsub processor, since fix is on the way.

separator: ',\s*'
ignore_missing: true
- convert:
description: Creates a target field containing valid ip's if ALL values are valid ip's.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing all IPs if one is invalid seems unpleasant. This in conjunction with the special case you have in the pipeline test config suggest that a painless script over this array to clean it up would be a good thing to have. The only test failure that I saw was a case where the forwarded IP was localhost:8080, so a host port split and a mapping from "localhost" to "127.0.0.1" should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that we should not loose valid addresses in case one them is invalid. I implemented it this way due to complexity.

Now I have rewritten it by using a script and it should almost do what you recommended. The only difference is that I ignore the host/port split and just look for a value starting with localhost. I guess this should do the job, since receiving host port is really an edge case.

As you can see, the current implementation is far from ideal. Since I did not find a simple way to check if a string is an ip, I used a regex with patterns copied from logstash. Please let me know if there is a better way of doing this, since I really don't like it at all.

I still add invalid addresses to error.message, but did not add a test log for it in order to not break the test.

@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from 3be4acd to ea0a6e7 Compare August 6, 2024 16:08
@efd6
Copy link
Contributor

efd6 commented Aug 6, 2024

/test

@efd6
Copy link
Contributor

efd6 commented Aug 6, 2024

The system tests will need to be updated to use tabs as well. That file is here.

@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from 94d1c5b to 7de3cf3 Compare August 7, 2024 09:11
@ltflb-bgdi
Copy link
Contributor Author

/test

@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch 2 times, most recently from 48c6f4a to d0f0010 Compare August 7, 2024 09:19
@ltflb-bgdi
Copy link
Contributor Author

/test

@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from d0f0010 to dbbe9e3 Compare August 7, 2024 09:35
@ltflb-bgdi
Copy link
Contributor Author

/test

@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from dbbe9e3 to 945386b Compare August 7, 2024 09:41
@ltflb-bgdi ltflb-bgdi force-pushed the fix_pb-561_cloudfront_logs branch from f8b7aa8 to c2db3bd Compare August 22, 2024 07:24
@andrewkroh
Copy link
Member

/test

@andrewkroh
Copy link
Member

Failure message:

<testcase name="system test: default" classname="aws.cloudfront_logs" time="95.697515564">
<failure>
observed hit count 12 did not match expected hit count 11
</failure>
</testcase>

@andrewkroh
Copy link
Member

/test

@elasticmachine
Copy link

💚 Build Succeeded

History

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
64.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@andrewkroh andrewkroh removed the request for review from a team September 4, 2024 04:58
@andrewkroh
Copy link
Member

Everything looks in order to me now.

@elastic/obs-infraobs-integrations, as codeowner of cloudfront_logs, do you want to handle merging?

@ltflb-bgdi
Copy link
Contributor Author

@elastic/obs-infraobs-integrations Any update?

@anil-elastic anil-elastic merged commit 7821fe3 into elastic:main Sep 10, 2024
4 of 5 checks passed
@elasticmachine
Copy link

Package aws - 2.24.1 containing this change is available at https://epr.elastic.co/search?package=aws

harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 4, 2025
* mimecast: add message release logs data stream (elastic#10732)

* Fix AWS cloudfront log parsing

* Refactored cloudfront parser

* Updated aws cloudfront testlogs

* Updated aws cloudfront logs test config

* Moved cloudfront.content_type to http.response.mime_type

* Added field mapping for aws.cloudfront range fields.

* Fixed edge_detailed_result_type

* Updated cloudfront.md

* Implemented review recommendations

* Updated cloudfront_logs system test log file

* Fixed CI problems

* Update sample_event.json

* Update cloudfront.md

* formatted and build with newest elastic-package version

* Revert changes fixed by formatters

* Update sample_event.json

* Added review recommendations

* Beautified some pipeline conditionals

* Added additional cloudfront test case

* Fixed test-default-config hit count

---------

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 5, 2025
* mimecast: add message release logs data stream (elastic#10732)

* Fix AWS cloudfront log parsing

* Refactored cloudfront parser

* Updated aws cloudfront testlogs

* Updated aws cloudfront logs test config

* Moved cloudfront.content_type to http.response.mime_type

* Added field mapping for aws.cloudfront range fields.

* Fixed edge_detailed_result_type

* Updated cloudfront.md

* Implemented review recommendations

* Updated cloudfront_logs system test log file

* Fixed CI problems

* Update sample_event.json

* Update cloudfront.md

* formatted and build with newest elastic-package version

* Revert changes fixed by formatters

* Update sample_event.json

* Added review recommendations

* Beautified some pipeline conditionals

* Added additional cloudfront test case

* Fixed test-default-config hit count

---------

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
@ltflb-bgdi ltflb-bgdi deleted the fix_pb-561_cloudfront_logs branch February 17, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:aws AWS Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants