-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AWS cloudfront log parsing #10216
Changes from 20 commits
1149df3
ef07557
1fff8b1
47dc50c
d6d6f78
a47b847
fbce892
2bd26b6
efa0733
b7ad8ae
6940e0e
49192aa
176b2e0
9da56d3
9e7be47
bbb1424
ddf8720
5a7a8ea
82119f3
c2db3bd
682261e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,10 +2,13 @@ | |
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 2a02:cf40:add:4002:91f2:a9b2:e09a:6fc6 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - | ||
2019-12-04 21:02:31 LAX1 392 89.160.20.112 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please revert this change. |
||
2019-12-13 22:36:27 SEA19-C1 900 89.160.20.112 GET d111111abcdef8.cloudfront.net /favicon.ico 502 http://www.example.com/ Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 1pkpNfBQ39sYMnjjUQjmH2w1wdJnbHYTbag21o_3OfcQgPzdL2RSSQ== www.example.com http 675 0.102 - - - Error HTTP/1.1 - - 25260 0.102 OriginDnsError text/html 507 - - | ||
2019-12-13 22:36:26 SEA19-C1 900 89.160.20.112 GET d111111abcdef8.cloudfront.net / 502 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 3AqrZGCnF_g0-5KOvfA7c9XLcf4YGvMFSeFdIetR1N_2y8jSis8Zxg== www.example.com http 735 0.107 - - - Error HTTP/1.1 - - 3802 0.107 OriginDnsError text/html 507 - - | ||
2019-12-13 22:37:02 SEA19-C2 900 89.160.20.112 GET d111111abcdef8.cloudfront.net / 502 - curl/7.55.1 - - Error kBkDzGnceVtWHqSCqBUqtA_cEs2T3tFUBbnBNkB9El_uVRhHgcZfcw== www.example.com http 387 0.103 - - - Error HTTP/1.1 - - 12644 0.103 OriginDnsError text/html 507 - - | ||
2022-04-19 12:29:36 SEA19-C2 10157 81.2.69.143 POST d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/100.0.4896.127%20Safari/537.36 source=global - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.238 - TLSv1.3 TLS_AES_128_GCM_SHA256 Miss HTTP/2.0 - - 4203 0.238 Miss application/json;charset=UTF-8 - - - | ||
2022-04-19 12:29:36 SEA19-C2 10157 81.2.69.143 POST d111111abcdef8.cloudfront.net /getApplications 000 https://test.com/global Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/100.0.4896.127%20Safari/537.36 source=global - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.238 - TLSv1.3 TLS_AES_128_GCM_SHA256 Miss HTTP/2.0 - - 4203 0.238 Miss application/json;charset=UTF-8 - - - | ||
2022-11-15 08:43:04 SEA19-C2 10157 81.2.69.143 GET d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(X11;%20Linux%20x86_64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20HeadlessChrome/100.0.4896.88%20Safari/537.36 - - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.093 81.2.69.142,216.160.83.56 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 - - 33359 0.093 Miss application/javascript - - - | ||
2022-04-19 12:29:36 SEA19-C2 10157 81.2.69.143 POST d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/100.0.4896.127%20Safari/537.36 source=global - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.238 - TLSv1.3 TLS_AES_128_GCM_SHA256 Miss HTTP/2.0 - - 4203 0.238 Miss application/json;charset=UTF-8 - - - | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do the lines that are changed here fail with the new code (I imagine they do)? Can we guarantee that the non-tab-delimited syntax in them is never found in the wild or documented (adding a link to the documentation for the log format in the proposed commit message would be good). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they do, since they contain space separators instead of tab. Since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add below test case also ( refernce ) : 2024-07-13 15:29:45 EWR53-C1 198083 127.0.0.1 GET xxxxxxxxxxxxx.cloudfront.net /en(test) 404 https://domain.tld/ User-Agent:%20Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.1;%20360SE) - - Error somevalidbase64== domain.tld https 609 0.318 - TLSv1.3 TLS_AES_128_GCM_SHA256 Error HTTP/1.1 - - 50294 0.318 Error text/html - |
||
2022-04-19 12:29:36 SEA19-C2 10157 81.2.69.143 POST d111111abcdef8.cloudfront.net /getApplications 000 https://test.com/global Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/100.0.4896.127%20Safari/537.36 source=global - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.238 - TLSv1.3 TLS_AES_128_GCM_SHA256 Miss HTTP/2.0 - - 4203 0.238 Miss application/json;charset=UTF-8 - - - | ||
2022-11-15 08:43:04 SEA19-C2 10157 81.2.69.143 GET d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(X11;%20Linux%20x86_64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20HeadlessChrome/100.0.4896.88%20Safari/537.36 - - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.093 81.2.69.142,216.160.83.56 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 - - 33359 0.093 Miss application/javascript - - - | ||
2022-11-15 08:43:04 SEA19-C2 10157 81.2.69.143 GET d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(X11;%20Linux%20x86_64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20HeadlessChrome/100.0.4896.88%20Safari/537.36 - - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.093 81.2.69.142, 216.160.83.56 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 - - 33359 0.093 Miss application/javascript - - - | ||
2022-11-15 08:43:04 SEA19-C2 10157 81.2.69.143 GET d111111abcdef8.cloudfront.net /getApplications 200 https://test.com/global Mozilla/5.0%20(X11;%20Linux%20x86_64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20HeadlessChrome/100.0.4896.88%20Safari/537.36 - - Miss hrsHM5OM6sTIXUleC1G20YtDxMf5Cq0Jbz0pwhVpod2kgEn_W6akCQ== test.com https 1057 0.093 localhost:8080 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 - - 33359 0.093 Miss application/javascript - - - | ||
2024-07-13 15:29:45 EWR53-C1 198083 127.0.0.1 GET xxxxxxxxxxxxx.cloudfront.net /en(test) 404 https://domain.tld/ User-Agent:%20Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.1;%20360SE) - - Error somevalidbase64== domain.tld https 609 0.318 - TLSv1.3 TLS_AES_128_GCM_SHA256 Error HTTP/1.1 - - 50294 0.318 Error text/html - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the change in this PR make us brittle to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing tabs should not have an effect on the csv parsers. Still, according to the AWS specs we should not see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this doesn't break the test, can we leave it in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. But our vs-code setup is configured to automatically remove trailing spaces after editing.
I have done a test with trailing tabs by adding them with another editor and the parser did not fail.
Given that fact and that trailing tabs should not appear in real life, I vote for letting the trailing tab removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should do that here; there may be tests in other packages that depend on trailing whitespace (note that we do not have an .editorconfig file in this repo so there is no guidance on the expectation of format).
While real data should not have the trailing tab, recent events have again demonstrated that testing with artifacts that match expectations can lead to bad outcomes. I would prefer that we show that we are robust to this case.