Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added unsafe code to remove ranks.len() (spares 8 byte) #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Tockra
Copy link
Contributor

@Tockra Tockra commented Oct 24, 2019

So time to discuss. I added code, which spares 8 byte for each hash function.
The self.ranks doesn't need his len() information because it is initialized with self.ranks==self.bitvecs.len() and you just need to safe it on one place. Its only needed while deallocate the space (Drop).

I measured the Space (0 to 2047 keys) and found no memory leak. I think there shouldn't be one if nobody hurts following invariants:

  • the computed ranks array slice by compute_ranks is not allowed to have a len() != self.bitvecs.len() .
  • the passed usize I in get_rank isn't allowed to overflow a isize. Otherwise the program panics

@pmarks
Copy link
Contributor

pmarks commented Oct 24, 2019

@Tockra Hmm, I'd really like to avoid adding unsafe code. It's just really hard to be sure you've done it right & it will continue to be correct as you evolve the code.

I'm still not sure I understand your use case. Can you tell me roughly how many instance of Mphf you need to create, and what type and how many entries you are loading into each one?

In all our application we had only 1 or 2 Mphf instances, which contain a handful of BitVec and rank vectors, so these change save ~100s of bytes total.

@Tockra
Copy link
Contributor Author

Tockra commented Oct 25, 2019

I'm building a immutable/static variant of https://pdfs.semanticscholar.org/cd2f/fe40c25eaedadbd687ce679c3c988dbac142.pdf .
For huge Inputs (~2^32 u40 values) this structure has ~27000000 hash functions with u16 keys.
At the moment I'm comparing mphf hash functions, with a static lookup table, brown hashing and with simple binary search.
Mphf has the best memory usage but it's also the slowest variant...

B2T my code above is unsafe but it should be also be correct. The self.ranks arrays stores the length but it doesn't use it at any place... So it doesn't need to store the length...

@galo2099
Copy link
Contributor

Wouldn't you get a much bigger improvement by also removing the size from all the interior boxes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants