Training scripts for Grascii tokenizers typically used for machine learning models.
This tokenizer operates on normalized Grascii and is intended for use with a Roberta model. It is trained on the gregg-preanniversary-words dataset.
The X and XS strokes are encoded as S and SS respectively due to their high visual similarity.