This NER tagger uses a hidden-markov language model plus the Viterbi algorithm to tag Chinese sentences according to whether they contain named entities, such as the names of people, times, places and businesses. The model achieves 96% accuracy and an 87% f1-score on the dev.txt data.
- Python version: 3.5.1
After cloning the repository:
-
Setting up the environment:
cd ner-tagging
- Create a virtual environmnet:
python3 -m venv venv
source venv/bin/activate
- Install the project dependencies:
pip install –r requirements.txt
-
Start the program:
- Ensure that you are inside
/ner-tagging
and that your virtual environment is running - Enter
python models/hmm.py
. This will generate a file calledprobabilities.txt
- After generating the
probabilities.txt
file, to test the model on the dev.txt set: runpython tests/test_dev.py
. This will output the accuracy and f1-scores for the validation data set. - To generate predictions for the
test.content.txt
data, while in/ner-tagging
, runpython tests/test_test.py
. This will createpredictions.txt
- Deactivate your virtual environment by entering
deactivate
- Ensure that you are inside