v0.21.0
Breaking Changes
- Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. This is especially important for embedding models that can add a special token to the beginning of the sequence, and the chunks generated didn't actually fit within the context window because of this.
What's New
Rust
- MSRV is now 1.80 to remove dependency on once_cell.
Full Changelog: v0.20.1...v0.21.0