You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first one. CRLF can appear only if immediately followed by SP or HT. This is called line folding. This definition was inherited from the HTTP/1.1 RFC 2616 so you may find the explanatory text in section 2.2 of it helpful.
Note that while the WARC standard allows them, in practice line folding and non-UTF-8 encodings are not well supported, so I recommend WARC writers avoid using them. Those two features were also deprecated in the newer HTTP RFC 7230.
Yes. I haven't seen it used in real WARC files in the wild, but a fully compliant parser should support it.
From what I've seen, many (but not all) parsers support line folding but vary in how they interpret it as a string in their header reading API. Some including the LWS sequence as is, others replacing it with a single space or linefeed. I haven't seen any parser that supports the non-UTF-8 'encoded-word' feature though.
I'm confused by this rule in the ABNF provided in The WARC Format 1.1:
Which of these (if any) is the correct interpretation:
The text was updated successfully, but these errors were encountered: