This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.
-
pip:
pip install -e git+https://github.com/petar-popovic-bg/Jerteh.git#egg=Jerteh
update:
pip install -e git+https://github.com/petar-popovic-bg/Jerteh.git#egg=Jerteh --upgrade
-
Edit your treetaggerwrapper.py file inside your virtual environment, so the wrapper supports Serbian-latin and Serbian-cyrillic script.
""" ('slovak', 'sk'), ('swahili', 'sw'), ('serbian-lat', 'sr-lat'), ('serbian-cyr', 'sr-cyr')]: ls = g_langsupport[lang] = copy.deepcopy(g_langsupport['__base__']) ... g_langsupport['sk']['dummysentence'] = 'To je koniec . .' g_langsupport['sw']['dummysentence'] = 'Hii ni mwisho . .' g_langsupport['sr-lat']['dummysentence'] = 'Ovo je kraj . .' g_langsupport['sr-cyr']['dummysentence'] = 'Ово је крај . .' """
-
Edit configure.py, so it points to your local installations of TreeTagger and Unitex.
Using TreeTagger and Unitex classes requires TreeTagger and Unitex to be installed on your machine.