-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathproblem.html
542 lines (474 loc) · 16.5 KB
/
problem.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Order Book Programming Problem</title>
<style type="text/css">
P.message {
font-family: monospace;
margin: 1em 2em;
padding: 0.5em;
border: 1px solid gray;
}
CODE, .data {
font-family: monospace;
white-space: nowrap;
}
.data {
padding: 0.5em;
}
VAR {
font-family: monospace;
font-style: oblique;
}
TD {
vertical-align: text-top;
}
</style>
</head>
<body>
<h1>Order Book Programming Problem</h1>
<h2>Background</h2>
<p> Suppose your great aunt Gertrude dies and leaves you $3000 which you
decide you want to invest in the Acme Internet Widget Company (stock
symbol:AIWC). You are willing to pay up to $30 per share of AIWC. So you
log in to your online trading account and enter a limit order: "BUY 100
AIWC @ $30". It's a limit order because that's most you're willing to
pay. You'd be willing to pay less than $30, but not more than $30. Your
order will sit around in the market until you get your 100 shares. A
limit order to buy is called a "bid". </p>
<p> But you're not the only prospective buyer for AIWC stock. Others
want to buy it too. One is bidding $31/share for 200 shares, while
another is bidding $29/share for 300 shares. When Warren Buffett wants
to sell 225 shares, he's obviously going to take the highest price
he can get for each share. So he hits the $31 bid first, selling 200
shares. Then he sells his remaining 25 shares to you at $30/share. Your
bid size reduced by 25, leaving 75 shares still to be bought. </p>
<p> Suppose you eventually get the full 100 shares at some price. Next
year, you decide to buy a new computer and you need $4500 for it, and
luckily the value of AIWC has appreciated by 50%. So you want to sell
your 100 shares of AIWC stock for at least $45/share. So you enter this
limit order: "SELL 100 AIWC @ $45". A limit order to sell is called an
"ask". </p>
<p> But you're not the only prospective seller of AIWC stock. There's
also an ask for $44/share and an ask for $46/share. If Alan Greenspan
wants to buy AIWC, he's obviously going to pay as little as possible. So
he'll take the $44 offer first, and only buy from you at $45 if he can't
buy as much as he wants at $44. </p>
<p> The set of all standing bids and asks is called a "limit order
book", or just a "book". You can buy a data feed from the stock market,
and they will send you messages in real time telling you about changes
to the book. Each message either adds an order to the book, or reduces
the size of an order in the book (possibly removing the order entirely).
You can record these messages in a log file, and later you can go back
and analyze your log file. </p>
<h2>Problem</h2>
<p> Your task is to write a program, <i>Pricer</i>, that analyzes
such a log file. <i>Pricer</i> takes one command-line argument:
<var>target-size</var>. <i>Pricer</i> then reads a market data log
on standard input. As the book is modified, <i>Pricer</i> prints (on
standard output) the total expense you would incur if you bought
<var>target-size</var> shares (by taking as many asks as necessary,
lowest first), and the total income you would receive if you sold
<var>target-size</var> shares (by hitting as many bids as necessary,
highest first). Each time the income or expense changes, it prints the
changed value.</p>
<h3>Input Format</h3>
<p> The market data log contains one message per line (terminated by a
single linefeed character, '<code>\n</code>'), and each
message is a series of fields separated by spaces. </p>
<p>
An "Add Order to Book" message looks like this:
</p>
<p class='message'><var>timestamp</var> A <var>order-id</var> <var>side</var> <var>price</var> <var>size</var></p>
<table>
<tr>
<th>Field</th>
<th>Meaning</th>
</tr>
<tr>
<td><var>timestamp</var></td>
<td>The time when this message was generated by the market,
as milliseconds since midnight.</td>
</tr>
<tr>
<td><code>A</code></td>
<td>A literal string identifying this as an "Add Order to Book"
message.</td>
</tr>
<tr>
<td><var>order-id</var></td>
<td>A unique string that subsequent "Reduce Order" messages
will use to modify this order.</td>
</tr>
<tr>
<td><var>side</var></td>
<td>A '<code>B</code>' if this is a buy order (a bid), and a
'<code>S</code>' if this is a sell order (an ask).
</td>
</tr>
<tr>
<td><var>price</var></td>
<td>The limit price of this order.</td>
</tr>
<tr>
<td><var>size</var></td>
<td>The size in shares of this order, when it was initially sent
to the market by some stock trader.</td>
</tr>
</table>
<p>
A "Reduce Order" message looks like this:
</p>
<p class='message'><var>timestamp</var> R <var>order-id</var> <var>size</var></p>
<table>
<tr>
<th>Field</th>
<th>Meaning</th>
</tr>
<tr>
<td><var>timestamp</var></td>
<td>The time when this message was generated by the market,
as milliseconds since midnight.</td>
</tr>
<tr>
<td><code>R</code></td>
<td>A literal string identifying this as an "Reduce Order"
message.</td>
</tr>
<tr>
<td><var>order-id</var></td>
<td>The unique string that identifies the order to be reduced.</td>
</tr>
<tr>
<td><var>size</var></td>
<td>The amount by which to reduce the size of the order. This
is <em>not</em> the new size of the order. If <var>size</var>
is equal to or greater than the existing size of the order, the
order is removed from the book.</td>
</tr>
</table>
<p> The log file messages are sorted by timestamp by the time
<i>Pricer</i> receives them. </p>
<p> If <i>Pricer</i> encounters an error in an input message, it must print
a warning to standard error and proceed to the next message. </p>
<h3>Output Format</h3>
<p><i>Pricer</i>'s output consists of one message per line in this
format: </p>
<p class='message'><var>timestamp</var> <var>action</var> <var>total</var></p>
<table>
<tr>
<th>Field</th>
<th>Meaning</th>
</tr>
<tr>
<td><var>timestamp</var></td>
<td>The timestamp from the input message that caused this output
message to be generated.</td>
</tr>
<tr>
<td><var>action</var></td>
<td>A string: '<code>B</code>' if this message contains the new
expense to buy <var>target-size</var> shares, and
'<code>S</code>' if this message contains the new income for
selling <var>target-size</var> shares.
</tr>
<tr>
<td><var>total</var></td>
<td><p>The total expense (if <var>action</var> is '<code>B</code>')
to buy <var>target-size</var> shares, or the total income
(if <var>action</var> is '<code>S</code>') for selling
<var>target-size</var> shares.</p>
<p>If the book does not contain <var>target-size</var> shares in
the appropriate type of order (asks for expense; bids for
income), the <var>total</var> field contains the string
'<code>NA</code>'. </p></td>
</td>
</tr>
</table>
<h3>Example Input and Output</h3>
<p> Here is an example run of <i>Pricer</i> with a <var>target-size</var>
of 200. Input messages are in the left column. Notes and output messages
are in the right column. </p>
<table>
<tr>
<th>Standard Input</th>
<th>Standard Output/Notes</th>
</tr>
<tr>
<td class='data'>28800538 A b S 44.26 100</td>
<td>No output yet because neither the bids nor the asks in the
book have a total of 200 shares yet.</td>
</tr>
<tr>
<td class='data'>28800562 A c B 44.10 100</td>
<td>Still not enough shares on either side of the book.</td>
</tr>
<tr>
<td class='data'>28800744 R b 100</td>
<td>This reduces order '<code>b</code>' to zero shares, which
removes it from the book, so now the book contains no asks. But
there's still no change to the total income or expense on 200
shares.</td>
</tr>
<tr>
<td class='data'>28800758 A d B 44.18 157</td>
<td>The bid sizes now total 257, which is more than the target size
of 200. To sell 200 shares, you would first hit the bid at
44.18 for 157 shares, spending $6936.26. Then you would hit the
bid at 44.10 for the remaining 43 shares, spending another
$1896.30. Your total income would be $8832.56, so <i>Pricer</i>
emits this message:
</td>
</tr>
<tr>
<td></td><td class='data'>28800758 S 8832.56</td>
</tr>
<tr>
<td class='data'>28800773 A e S 44.38 100</td>
<td>The book now contains a single ask of size 100, which is
still not enough to change the target size expense from
'<code>NA</code>'.</td>
</tr>
<tr>
<td class='data'>28800796 R d 157</td>
<td>This removes bid '<code>d</code>' from the book, leaving
just one bid with a size of 100 on the book, so the income from
selling changes to '<code>NA</code>':</td>
</tr>
<tr>
<td></td><td class='data'>28800796 S NA</td>
</tr>
<tr>
<td class='data'>28800812 A f B 44.18 157</td>
<td>This new bid brings the total bid size back over 200, so the
selling income is no longer '<code>NA</code>':</td>
</tr>
<tr>
<td></td><td class='data'>28800812 S 8832.56</td>
</tr>
<tr>
<td class='data'>28800974 A g S 44.27 100</td>
<td>This ask brings the total ask size up to 200, exactly the
target size. The total expense for buying 200 shares would be
100 * $44.27 + 100 * $44.38:</td>
</tr>
<tr>
<td></td><td class='data'>28800974 B 8865.00</td>
</tr>
<tr>
<td class='data'>28800975 R e 100</td>
<td>Removing ask '<code>e</code>' from the book leaves less than
200 shares on the ask side, so the buying expense changes back
to '<code>NA</code>':
</td>
</tr>
<tr>
<td></td><td class='data'>28800975 B NA</td>
</tr>
<tr>
<td class='data'>28812071 R f 100</td>
<td>Reducing bid '<code>f</code>' by 100 shares leaves only 157
shares on the bid side, so the selling income changes to
'<code>NA</code>':</td>
</tr>
<tr>
<td></td><td class='data'>28812071 S NA</td>
</tr>
<tr>
<td class='data'>28813129 A h B 43.68 50</td>
<td>This new bid makes it possible to sell 200 shares: 57 at
$44.18, 100 at $44.10, and the last 43 at $43.68.
</td>
</tr>
<tr>
<td></td><td class='data'>28813129 S 8806.50</td>
</tr>
<tr>
<td class='data'>28813300 R f 57</td>
<td>This removes bid '<code>f</code>' from the book, so it is
no longer possible to sell 200 shares:
</td>
</tr>
<tr>
<td></td><td class='data'>28813300 S NA</td>
</tr>
<tr>
<td class='data'>28813830 A i S 44.18 100</td>
<td>This ask makes it possible to buy 200 shares again: 100 at
$44.18 and 100 at $44.27.
</td>
</tr>
<tr>
<td></td><td class='data'>28813830 B 8845.00</td>
</tr>
<tr>
<td class='data'>28814087 A j S 44.18 1000</td>
<td>This ask has the same price as an existing ask, and these
two asks are tied for the best asking price. This means you
could now buy all 200 shares at $44.18 instead of buying half of
them at $44.27, so the buying expense decreases:
</td>
</tr>
<tr>
<td></td><td class='data'>28814087 B 8836.00</td
</tr>
<tr>
<td class='data'>28814834 R c 100</td>
<td>This leaves only 50 shares on the bid side (all in order
'<code>h</code>'), so it is still not possible to sell 200
shares. The selling income is therefore unchanged from
'<code>NA</code>' and <i>Pricer</i> prints no output message.
</td>
</tr>
<tr>
<td class='data'>28814864 A k B 44.09 100</td>
<td>Only 150 shares on the bid side, so no output needed.
</td>
</tr>
<tr>
<td class='data'>28815774 R k 100</td>
<td>Back to 50 shares on the bid side; still no output needed.
</td>
</tr>
<tr>
<td class='data'>28815804 A l B 44.07 175</td>
<td>There are now more than 200 shares on the bid side.
You could sell 175 shares at $44.07 each, and the remaining 25
shares at $43.68 each:
</td>
</tr>
<tr>
<td></td><td class='data'>28815804 S 8804.25</td>
</tr>
<tr>
<td class='data'>28815937 R j 1000</td>
<td>After ask '<code>j</code>' is removed from the book, you can
still buy 200 shares: 100 at $44.18 each, and 100 at the worse
price of $44.27.
</td>
</tr>
<tr>
<td></td><td class='data'>28815937 B 8845.00</td>
</tr>
<tr>
<td class='data'>28816245 A m S 44.22 100</td>
<td>Since $44.22 is a better price than $44.27, the buying
expense decreases:
</td>
</tr>
<tr>
<td></td><td class='data'>28816245 B 8840.00</td>
</tr>
</table>
<p>
Note that the book initially contains no orders, and that the buying
expense and selling income are both considered to start at
'<code>NA</code>'. Since <i>Pricer</i> only produces output when the
income or expense changes, it does not print anything until the total
size of all bids or the total size of all asks meets or exceeds
<var>target-size</var>.
</p>
<h2>What We're Looking For</h2>
<p> Please write the <i>Pricer</i> program in one of C, C++, Java, or Scala, and send us the source code.
You do not need to send us any compiler output. We encourage you to take advantage of the language's
standard libraries and latest versions. Please do not use other code that you would
have to download separately from your chosen language's normal distribution package. </p>
<p> We're looking for evidence that you can produce code that others
would be able to understand, fix, and extend. At the same time, we do
have real-time requirements for production code, so we frown on
gratuitous inefficiency. Here are some qualities we look for:
</p>
<ul>
<li> Correctness. Obviously, the fewer bugs you write, the fewer
you'll have to fix.
<li> Clarity. If only you can understand your code, then only you
can maintain it. Good code speaks for itself - a good programmer
should be able to understand your implementation details without
extensive comments.
<li> Conciseness. The less code you write, the less code your
coworkers need to puzzle out.
<li> Coefficiency. OK, just efficiency, but wouldn't it be cool if
all of these qualities started with a 'C'? Anyway, using less time and
space is generally better than otherwise.
</ul>
<p>
Of course, there are often trade-offs between each of these properties
(except correctness, which is pretty much an absolute).
</p>
<p> Also, we are a Linux shop and we like Unix-style tools: programs
that process the output of other programs. So make sure your
implementation of <i>Pricer</i> is suitable for use in a shell pipeline.
Follow the I/O specifications: don't mix prompts in with your output or
demand that the input come from a disk file.
</p>
<p> In addition to supplying us with your source code, please
answer these questions: </p>
<ul>
<li> How did you choose your implementation language?
<li> What is the time complexity for processing an Add Order
message?
<li> What is the time complexity for processing a Reduce Order
message?
<li> If your implementation were put into production and found to be
too slow, what ideas would you try out to improve its performance? (Other than reimplementing it in a different language such as C or C++.)
</ul>
<h2>Test Data</h2>
<p> You can download a large amount of test input data here: <a
href='./pricer.in.gz' >pricer.in.gz</a>.
This file is compressed using gzip. You should uncompress it before
feeding it to your program. This is real market data, collected from a
live system, so you might want to analyze it to decide what algorithms
are most appropriate to use in <i>Pricer</i>. </p>
<p>You can also download the corresponding output of our reference implementation of
<i>Pricer</i> with various target sizes:
<a href='./pricer.out.1.gz'
>target size 1</a>, <a
href='./pricer.out.200.gz'
>target size 200</a>, <a
href='./pricer.out.10000.gz'
>target size 10000</a>. These files are also compressed with
gzip.
</p>
<p> In case you want to test your implementation on a smaller sample of data, here is a snippet in easy-to-cut-and-paste format: </p>
<pre>
28800538 A b S 44.26 100
28800562 A c B 44.10 100
28800744 R b 100
28800758 A d B 44.18 157
28800773 A e S 44.38 100
28800796 R d 157
28800812 A f B 44.18 157
28800974 A g S 44.27 100
28800975 R e 100
28812071 R f 100
28813129 A h B 43.68 50
28813300 R f 57
28813830 A i S 44.18 100
28814087 A j S 44.18 1000
28814834 R c 100
28814864 A k B 44.09 100
28815774 R k 100
28815804 A l B 44.07 175
28815937 R j 1000
28816245 A m S 44.22 100
</pre>
<p> And here is the corresponding output: </p>
<pre>
28800758 S 8832.56
28800796 S NA
28800812 S 8832.56
28800974 B 8865.00
28800975 B NA
28812071 S NA
28813129 S 8806.50
28813300 S NA
28813830 B 8845.00
28814087 B 8836.00
28815804 S 8804.25
28815937 B 8845.00
28816245 B 8840.00
</pre>
</body>
</html>