Release 0.9.0
This release contains a major overhaul of Anonlink’s API and introduces support for multi-party linkage.
The changes are all additive, so the previous API continues to work. That API has now been deprecated and will be removed in a future release. The deprecation timeline is:
- v0.9.0: old API deprecated
- v0.10.0: use of old API raises a warning
- v0.11.0: remove old API
Major changes
- Introduce abstract similarity functions. The Sørensen–Dice coefficient is now just one possible similarity function.
- Implement Hamming similarity as a similarity function.
- Permit linkage of records other than CLKs (BYO similarity function).
- Similarity functions now return multiple contiguous arrays instead of a list of tuples.
- Candidate pairs from similarity functions are now always sorted.
- Introduce a standard type for storing candidate pairs. This is now used consistently throughout the API.
- Provide a function for multiparty candidate generation. It takes multiple datasets and compares them against each other using a similarity function.
- Extend the greedy solver to multiparty problems.
- The greedy solver also takes the new candidate pairs type.
- Implement serialisation and deserialisation of candidate pairs.
- Multiple files with serialised candidate pairs can be merged without loading everything into memory at once.
- Introduce type annotations in the new API.
Minor changes
- Automatically test on Python 3.7.
- Remove support for Python 3.5 and below.
- Update Clkhash dependency to 0.11.
- Minor documentation and style in
anonlink.concurrency
. - Provide a convenience function for generating valid candidate pairs from a chunk.
- Change the format of a chunk and move the type definition to
anonlink.typechecking
.
See the changelog for details.