forked from dmeranda/httpheader
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
251 lines (196 loc) · 8.33 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
About "httpheader" by Deron Meranda
Version 1.1 (February 2013)
Homepage: http://deron.meranda.us/python/httpheader/
Author: Deron Meranda <http://deron.meranda.us/>
This is free software. See the included LICENSE.txt file for details.
====================
About httpheader
====================
Httpheader is a Python module for dealing with HTTP headers and
content negotiation. It provides a set of utility functions and
classes which properly implement all the details and edge cases of the
HTTP 1.1 protocol headers. Httpheader is intended to be used as part
of a larger web framework or any application that must deal with HTTP.
In particular, httpheader can handle:
* Byte range requests (multipart/byteranges)
* Content negotiation (content type, language, all the Accept-*
style headers; including full support for priority/qvalue
handling.
* Content/media type parameters
* Conversion to and from HTTP date and time formats
====================
Requirements
====================
This module is pure Python and does not have any additional
dependencies. It requires Python 2.2 or any later 2.x version. It is
not directly supported under Python 3.x.
====================
Summary of functions and classes
====================
There are a few classes defined by this module:
* class content_type -- media types such as 'text/plain'
* class language_tag -- language tags such as 'en-US'
* class range_set -- a collection of (byte) range specifiers
* class range_spec -- a single (byte) range specifier
The primary functions in this module may be categorized as follows:
* Content negotiation functions...
* acceptable_content_type()
* acceptable_language()
* acceptable_charset()
* acceptable_encoding()
* Mid-level header parsing functions...
* parse_accept_header()
* parse_accept_language_header()
* parse_range_header()
* Date and time...
* http_datetime()
* parse_http_datetime()
* Utility functions...
* quote_string()
* remove_comments()
* canonical_charset()
* Low level string parsing functions...
* parse_comma_list()
* parse_comment()
* parse_qvalue_accept_list()
* parse_media_type()
* parse_number()
* parse_parameter_list()
* parse_quoted_string()
* parse_range_set()
* parse_range_spec()
* parse_token()
* parse_token_or_quoted_string()
And there are some specialized exception classes:
* RangeUnsatisfiableError
* RangeUnmergableError
* ParseError
See also:
* RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", June 1999.
<http://www.ietf.org/rfc/rfc2616.txt>
Errata at <http://purl.org/NET/http-errata>
* RFC 2046, "(MIME) Part Two: Media Types", November 1996.
<http://www.ietf.org/rfc/rfc2046.txt>
* RFC 3066, "Tags for the Identification of Languages", January 2001.
<http://www.ietf.org/rfc/rfc3066.txt>
====================
Using HTTP range requests
====================
This is a simple example of how to use this module to correctly handle
HTTP range requests (on a GET or HEAD operation); i.e., those that
specify the Ranges: header.
This example is presented in a way that is not bound to any particular
web framework, whether that be mod_python, WSGI, etc. So some of the
actions are intentionally generic. This example is also intentionally
simplified by assuming that you know ahead of time how big the whole
"file" is. If you don't know this, such as if the file is generated
on-the-fly, then your actual use will be a bit more complex (mainly
because range requests may reference file offsets relative to the end
of the file rather than the beginning).
This is what you must do first, by whatever method is available to you
in your framework.
----------
# Preliminary handling of the HTTP request
ranges_hdr = http_request_headers.get( 'Range' )
# This example only supports GET and HEAD requests
if http_method not in ('GET', 'HEAD'):
http_response_headers[ 'Allow' ] = 'GET, HEAD'
set_http_return_code( 405, 'Method not allowed' )
return # stop processing
# Determine the "file" being requested
file_size = size of the whole file being retrieved.
file_contents = the actual contents of the file
file_content_type = 'application/octet-stream' # or whatever it is
----------
Now try to parse the Range header and figure out what portion(s) of
the file has been requested.
----------
# Process the ranges header
import httpheader
ranges = None # None means return the whole file
if ranges_hdr:
try:
ranges = httpheader.parse_range_header( ranges_hdr )
except httpheader.ParseError:
# Ranges header is malformed; you should ignore this error
# per the RFC and just serve the whole file instead.
ranges = None
if ranges:
try:
# Simplify/optimize the range(s) if possible
ranges.fix_to_size( file_size )
ranges.coalesce() # You may wish to skip this
except httpheader.RangeUnsatisfiableError:
# Requested range can not be satisifed, return error
http_repsonse_header['Content-Range'] = '*/%d' % file_size
set_http_return_code( 416, 'Range not satisfiable' )
return # stop processing
if ranges.is_single_range() and \
ranges.range_specs[0].first == 0 and \
ranges.range_specs[0].last == file_size - 1:
# Effectively getting whole file in pieces
ranges = None
----------
At this point, if ranges is None, then you're returning the whole file
just like you would in a normal request. Be sure to return the
Accept-Ranges header to announce that you can process range requests
on future requests.
----------
if not ranges:
# Returning the whole file
http_response_header['Accept-Ranges'] = 'bytes'
http_response_header['Content-Length'] = file_size
http_response_header['Content-Type'] = file_content_type
set_http_return_code( 200, 'OK' )
if http_method != 'HEAD':
http_response_write( file_contents )
----------
Otherwise you are returning one or more portions of the file. First
we set up the various HTTP response headers.
----------
else: # ranges is not None
if ranges.is_single_range():
# Just one part of the file, no need for a multipart
http_response_header['Content-Type'] = file_content_type
http_response_header['Content-Range'] = \
'%s %d-%d/%d' % ( 'bytes',
ranges.range_specs[0].first,
ranges.range_specs[0].last,
file_size - 1)
else:
# Multiple parts. Pick a random MIME boundary string
import randon, string
boundary = '--------' + ''.join([ random.choice(string.letters) for i in range(32) ])
http_response_header['Content-Type'] = \
'multipart/byteranges; boundary=%s' % boundary
http_response_header['Accept-Ranges'] = ranges.units # 'bytes'
set_http_return_code( 206, 'Partial Content' )
----------
Now we output the actual content.
----------
if http_method == 'GET':
if ranges.is_single_range():
# Single range output
first = ranges.range_specs[0].first )
last = ranges.range_specs[0].last + 1 # Notice the +1
http_response_write( file_contents[first:last] )
else: # not ranges.is_single_range()
# Multipart!
for R in ranges.range_specs:
http_response_write( '%s\r\n' % boundary )
http_response_write('Content-Type: %s\r\n' % file_content_type )
http_response_write('Content-Range: %s %d-%d/%d\r\n' \
% (ranges.units, R.first, R.last, file_size) )
http_response_write( '\r\n' )
http_respose_write( file_contents[ R.first : R.last+1 ] )
http_response_write( '\r\n' )
http_response_write( '%s--\r\n' % boundary )
elif http_method == 'HEAD':
# do similar to above to output correct Content-Length,
# but don't output body.
----------
Don't forget: In a real-world use, be sure to return the ETag and
other cache-friendly headers such as Last-Modified or Digest. Just be
sure that those headers are with respect to the entire file, not just
the portions returned in the range request.
== End ==