Skip to content

v0.13.2 - CodeSplitter

Compare
Choose a tag to compare
@benbrandt benbrandt released this 02 Jun 20:36
· 370 commits to main since this release

What's Changed

New CodeSplitter for splitting code in any languages that tree-sitter grammars are available for. It should provide decent chunks, but please provide feedback if you notice any strange behavior.

Rust Usage

cargo add text-splitter --features code
cargo add tree-sitter-<language>
use text_splitter::CodeSplitter;
// Default implementation uses character count for chunk size.
// Can also use all of the same tokenizer implementations as `TextSplitter`.
let splitter = CodeSplitter::new(tree_sitter_rust::language(), 1000).expect("Invalid tree-sitter language");

let chunks = splitter.chunks("your code file");

Python Usage

from semantic_text_splitter import CodeSplitter
import tree_sitter_python

# Default implementation uses character count for chunk size.
# Can also use all of the same tokenizer implementations as `TextSplitter`.
splitter = CodeSplitter(tree_sitter_python.language(), capacity=1000)

chunks = splitter.chunks("your code file");

Full Changelog: v0.13.1...v0.13.2