Python implementation of a rule-based garbage cleaning algorithm.
Based on:
- Taghva et al. (2001) “Automatic Removal of Garbage Strings in OCR Text: An implementation” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.8901
- Yang Cai (2008) “OCR Output Enhancement” https://ladyissy.github.io/OCR/
- R implementation by benmarwick.