-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AWS cloudfront log parsing #10216
Fix AWS cloudfront log parsing #10216
Conversation
💚 CLA has been signed |
91f9492
to
c31f466
Compare
/test |
11fb5fc
to
04579e2
Compare
/test |
packages/aws/changelog.yml
Outdated
@@ -1,4 +1,9 @@ | |||
# newer versions go on top | |||
- version: "2.23.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- version: "2.23.0" | |
- version: "2.22.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
packages/aws/manifest.yml
Outdated
@@ -1,7 +1,7 @@ | |||
format_version: 3.0.0 | |||
name: aws | |||
title: AWS | |||
version: 2.22.0 | |||
version: 2.23.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version: 2.23.0 | |
version: 2.22.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to 2.22.2 due to merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this. It was intentionally removed in #10223.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- drop: | ||
if: "ctx.event.original.startsWith('#')" | ||
- drop: | ||
if: ctx?.event?.original?.startsWith('#') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if: ctx?.event?.original?.startsWith('#') | |
if: ctx.event?.original?.startsWith('#') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 2a02:cf40:add:4002:91f2:a9b2:e09a:6fc6 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the change in this PR make us brittle to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing tabs should not have an effect on the csv parsers. Still, according to the AWS specs we should not see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this doesn't break the test, can we leave it in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. But our vs-code setup is configured to automatically remove trailing spaces after editing.
I have done a test with trailing tabs by adding them with another editor and the parser did not fail.
Given that fact and that trailing tabs should not appear in real life, I vote for letting the trailing tab removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But our vs-code setup is configured to automatically remove trailing spaces after editing.
I don't think you should do that here; there may be tests in other packages that depend on trailing whitespace (note that we do not have an .editorconfig file in this repo so there is no guidance on the expectation of format).
While real data should not have the trailing tab, recent events have again demonstrated that testing with artifacts that match expectations can lead to bad outcomes. I would prefer that we show that we are robust to this case.
- elastic_package_remove_error_message # Used to solve elastic-package test error on adding error message to event. | ||
dynamic_fields: | ||
'event.ingested': "^.*$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamic_fields
configuration is not needed since we are not setting it in the pipeline.
I don't like the testing-hack for error.message
. Rather than doing this I'd suggest making a host:port parser to handle the cases, and map localhost to its IP (IPv4 should be fine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the testing-hack either! And to be honest, I really don't get it why there is no fix from elastic yet to allow the error.message
field in the test, since is is just a common ecs field, which even can be part of the original message. This is a known problem with an existing issue waiting to be fixed for almost a year.
lang: painless | ||
source: >- | ||
ctx._tmp.timestamp = ctx._tmp.date + 'T' + ctx._tmp.time; | ||
if: ctx?._tmp?.date != null && ctx?._tmp?.time != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if: ctx?._tmp?.date != null && ctx?._tmp?.time != null | |
if: ctx._tmp?.date != null && ctx._tmp?.time != null |
(similar change throughout: ctx
is always non-null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ignore_failure: true | ||
- append: | ||
field: related.ip | ||
value: "{{ source.ip }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value: "{{ source.ip }}" | |
value: "{{{ source.ip }}}" |
(throughout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
description: In rare cases, the field contains hex encoded whitspaces (\20). We replace them by whitespaces again to not break parsing. | ||
field: _tmp.x_forwarded_for | ||
target_field: _tmp.forwarded_ip | ||
pattern: "\\\\x20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern: "\\\\x20" | |
pattern: '\\x20' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that AWS has analyzed this issue and considered it as a bug, with a fix expected to be rolled out within 4 weeks from now. We could think about removing this workaround already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed gsub processor, since fix is on the way.
separator: ',\s*' | ||
ignore_missing: true | ||
- convert: | ||
description: Creates a target field containing valid ip's if ALL values are valid ip's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing all IPs if one is invalid seems unpleasant. This in conjunction with the special case you have in the pipeline test config suggest that a painless script over this array to clean it up would be a good thing to have. The only test failure that I saw was a case where the forwarded IP was localhost:8080, so a host port split and a mapping from "localhost" to "127.0.0.1" should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree that we should not loose valid addresses in case one them is invalid. I implemented it this way due to complexity.
Now I have rewritten it by using a script and it should almost do what you recommended. The only difference is that I ignore the host/port split and just look for a value starting with localhost. I guess this should do the job, since receiving host port is really an edge case.
As you can see, the current implementation is far from ideal. Since I did not find a simple way to check if a string is an ip, I used a regex with patterns copied from logstash. Please let me know if there is a better way of doing this, since I really don't like it at all.
I still add invalid addresses to error.message, but did not add a test log for it in order to not break the test.
3be4acd
to
ea0a6e7
Compare
/test |
The system tests will need to be updated to use tabs as well. That file is here. |
94d1c5b
to
7de3cf3
Compare
/test |
48c6f4a
to
d0f0010
Compare
/test |
d0f0010
to
dbbe9e3
Compare
/test |
dbbe9e3
to
945386b
Compare
f8b7aa8
to
c2db3bd
Compare
/test |
Failure message:
|
packages/aws/data_stream/cloudfront_logs/_dev/test/system/test-default-config.yml
Outdated
Show resolved
Hide resolved
/test |
💚 Build Succeeded
History
|
|
Everything looks in order to me now. @elastic/obs-infraobs-integrations, as codeowner of cloudfront_logs, do you want to handle merging? |
@elastic/obs-infraobs-integrations Any update? |
Package aws - 2.24.1 containing this change is available at https://epr.elastic.co/search?package=aws |
* mimecast: add message release logs data stream (elastic#10732) * Fix AWS cloudfront log parsing * Refactored cloudfront parser * Updated aws cloudfront testlogs * Updated aws cloudfront logs test config * Moved cloudfront.content_type to http.response.mime_type * Added field mapping for aws.cloudfront range fields. * Fixed edge_detailed_result_type * Updated cloudfront.md * Implemented review recommendations * Updated cloudfront_logs system test log file * Fixed CI problems * Update sample_event.json * Update cloudfront.md * formatted and build with newest elastic-package version * Revert changes fixed by formatters * Update sample_event.json * Added review recommendations * Beautified some pipeline conditionals * Added additional cloudfront test case * Fixed test-default-config hit count --------- Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
* mimecast: add message release logs data stream (elastic#10732) * Fix AWS cloudfront log parsing * Refactored cloudfront parser * Updated aws cloudfront testlogs * Updated aws cloudfront logs test config * Moved cloudfront.content_type to http.response.mime_type * Added field mapping for aws.cloudfront range fields. * Fixed edge_detailed_result_type * Updated cloudfront.md * Implemented review recommendations * Updated cloudfront_logs system test log file * Fixed CI problems * Update sample_event.json * Update cloudfront.md * formatted and build with newest elastic-package version * Revert changes fixed by formatters * Update sample_event.json * Added review recommendations * Beautified some pipeline conditionals * Added additional cloudfront test case * Fixed test-default-config hit count --------- Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
Proposed commit message
Fixed AWS cloudfront log parsing
Please explain:
WHAT:
Make sure only real ip addresses are used. All malformed ip-addresses are listed in error.message without raising a pipeline error.
event.kind: pipeline_error
to error handler.WHY:
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
The PR contains sample logs of the problematic events.
To be tested with
elastic-package test pipeline --data-streams cloudfront_logs
Note: Please note that the log producing the error.message is not added to the sample logs since it would let the
tests fail. See: elastic/elastic-package#1392
Related issues