Skip to content

v0.21.0

Compare
Choose a tag to compare
@benbrandt benbrandt released this 16 Jan 07:55
· 27 commits to main since this release
9da8748

Breaking Changes

  • Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. This is especially important for embedding models that can add a special token to the beginning of the sequence, and the chunks generated didn't actually fit within the context window because of this.

What's New

Rust

  • MSRV is now 1.80 to remove dependency on once_cell.

Full Changelog: v0.20.1...v0.21.0