Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): use deque for insert-order message queue #1316

Merged
merged 6 commits into from
Jan 21, 2025

Conversation

alkasm
Copy link
Contributor

@alkasm alkasm commented Jan 20, 2025

Changelog

Significantly increases the performance of iterating through an MCAP file in insertion order.

Docs

None. (internal only)

Description

I used iter_messages(log_time_order=False) since my application doesn't care about message order, expecting it to perform better than log_time_order=True (the default). However I noticed that my application took 17 minutes to iterate through 3.5 million MCAP messages. It took 32 seconds with log_time_order=True.

The previous message queue implementation used a list as a FIFO queue, which has poor performance as pop(0) will copy the list every time an item is evicted. After changing the implementation to use a deque, log_time_order=False took 15 seconds.

Here's code for a minimal performance test. Note that the mcap file has ~3.5 million messages.

import time
from collections import deque
from mcap.reader import make_reader

with open("example.mcap", "rb") as f:
    reader = make_reader(f)
    start = time.time()
    deque(reader.iter_messages(log_time_order=False), maxlen=0)
    end = time.time()
    print(end - start)

Note that deque(iterable, maxlen=0) will consume the iterable in C, so basically as fast as possible (see the itertools recipe for consume()).

I ran this three times after the fix, but just once before because I didn't have the patience 😄

BeforeAfter
1125.6354639530182 11.093337059020996
10.840037822723389
10.490556001663208

Over 100x faster! If you're curious, log_time_order=True was ~17 seconds (both before and after the changeset).

I also updated the code to have two separate impls rather than a bifurcated implementation.

I added a unit test to assert that log_time_order=False is faster than True (purposely adding messages in reverse log time order so that the heap should perform worse). As with most perf unit tests, it is potentially flakey with smaller inputs so I put 10k messages in it, but feel free to discuss if there's a different way you want to run through that.

self._reverse = reverse

def push(self, item: QueueItem):
orderable = _make_orderable(item, self._reverse)
Copy link
Collaborator

@james-rms james-rms Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make an orderable if the collection does not use the comparison operator?

Copy link
Contributor Author

@alkasm alkasm Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I guess log_time_order=False with reverse=True doesn't really make sense? I'll do a ValueError in the factory function then if both are provided. LMK if you think there should be a different approach.

@james-rms james-rms merged commit babc294 into foxglove:main Jan 21, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants