This is a repo where I plan to put my experimental models behind a REST API to be deployed in Heroku. It is a Python Flask server running behind gunicorn.
/nlp/stemmer/q=**words**
: returns the stems of words using a Seq2Seq Neural Model- Params:
- words (str): contains 1+ words. Special characters like quote, question mark will be discarded
- Returns:
- stemmed words (str). Not wrapped in any JSON or similar for now.
- Model details:
- Characters are one hot encoded.
- 2 layers of Bidirectional GRUs with 32 hidden units
- 2 layers of GRUs with 64 hidden units, each using the corresponding state of the encoder
- 78k params
- Model is trained in personal Colab (https://colab.research.google.com/drive/1SjuaADaHocVkNfgeY0iYsKj_c1sAm7KK)
- Training data details:
- I once found a Turkish corpus from Zargan containing words, their roots, freqs, etc. Now it's not online. I used 50k word pairs for training & testing.
- Example Usages (already deployed):
- Params:
curl https://cli-assets.heroku.com/install.sh | sh
# https://devcenter.heroku.com/articles/heroku-cli#standalone-installationheroku login --interactive
python -m venv venv
. venv/bin/activate
pip install -r requirements.txt
heroku create
git push heroku main
Local testing:
heroku local