v0.13.2 - CodeSplitter
What's Changed
New CodeSplitter
for splitting code in any languages that tree-sitter grammars are available for. It should provide decent chunks, but please provide feedback if you notice any strange behavior.
Rust Usage
cargo add text-splitter --features code
cargo add tree-sitter-<language>
use text_splitter::CodeSplitter;
// Default implementation uses character count for chunk size.
// Can also use all of the same tokenizer implementations as `TextSplitter`.
let splitter = CodeSplitter::new(tree_sitter_rust::language(), 1000).expect("Invalid tree-sitter language");
let chunks = splitter.chunks("your code file");
Python Usage
from semantic_text_splitter import CodeSplitter
import tree_sitter_python
# Default implementation uses character count for chunk size.
# Can also use all of the same tokenizer implementations as `TextSplitter`.
splitter = CodeSplitter(tree_sitter_python.language(), capacity=1000)
chunks = splitter.chunks("your code file");
Full Changelog: v0.13.1...v0.13.2