Skip to content

Releases: WorksApplications/SudachiTra

v0.1.9

15 Dec 13:53
3f4a6c3
Compare
Choose a tag to compare

Highlights

  • Support 4.34 and newer version of transformers (#66)

v0.1.8

17 Mar 02:43
009bf1a
Compare
Choose a tag to compare

Highlights

  • Add new word_format_type: normalized_nouns. (#48, #50)
    • Normalizes morphemes that do not have conjugation form.

v0.1.7

27 Dec 08:46
6b8be88
Compare
Choose a tag to compare

Highlights

  • Update sudachipy version in order to use PosMatcher
    • Required sudachipy>=0.6.2
  • Add preprocessing codes #32 #34
    • Normalizers and filters for pretraining corpus
  • NormalizedConjugation #31 #35
    • New word_form_type that normalizes a morpheme with preserving conjugation of a word

v0.1.6

17 Nov 06:33
837d81a
Compare
Choose a tag to compare

update

  • update sudachipy version #30
    • require SudachiPy impremented in Rust (sudachipy>=0.6.0).
  • add InputStringNormalizer #27
    • add NFKC and Lowercase normalization to tokenizers.

obsolete feature

  • remove SlowTokenizer #29

v0.1.5

23 Aug 05:49
ddeea0b
Compare
Choose a tag to compare
  • improve default configurations #21
    • add word forms and tests
      • surface_harf_ascii
      • dictionary_half_ascii
      • dictionary_and_surface_half_ascii

v0.1.4

15 Aug 12:01
10ee97a
Compare
Choose a tag to compare
  • Fix a slow tokenizer

v0.1.3

15 Aug 06:13
45d95d9
Compare
Choose a tag to compare
  • Add a slow tokenizer for development

v0.1.2

14 Jul 06:35
0465f3b
Compare
Choose a tag to compare

Fix a bug related #12

v0.1.1

12 Jul 03:33
ced5a47
Compare
Choose a tag to compare
  • Bump bunkai from 1.3.0 to 1.4.0 (#10)
  • Fix a bug related (#11)

v0.1.0

25 Jun 01:49
63dc4a6
Compare
Choose a tag to compare

First release.

chiTra is a Japanese tokenizer for Transformers.
chiTra stands for Sudachi for Transformers.
https://github.com/WorksApplications/SudachiTra