A work in progress...
kn1ght
's tokenizer is optimized for Chess' Portable Game Notation (PGN) format.
Note: kn1ght
's tokenizer does not currently account for PGN metadata (Event
, Site
, Date
, etc.), PGN comments ({...}
), notes about clock times ({[%clk ...]}
), or other miscellaneous PGN data. It only focuses on the actual moves played in the game.
It has been trained on a small dataset of 3.5M chess games from ChessDB cleaned up by Kaggle user milesh1.