AI model for resolving duplicities in Czech National Oncology Register. Bachelor's thesis about this project is available here.
- Python
$\ge 3.10$
Install the package either by:
- Downloading the repository as a ZIP file by clicking the on the badge at the beginning of this
README
file. - Cloning the repository
Run the file install.ps1
either by right-clicking and selecting Run with PowerShell
or by running the command in PowerShell:
.\install.ps1
Create a virtual environment and activate it (optional but recommended):
python -m venv venv
Activate the virtual environment:
source venv/bin/activate
Install the requirements:
pip install .
Edit the paths in the scripts/constants.py
file, e.g. paths to the data or model.
In terminal, run the command:
nor-cleaner [-h] {prepare,train,predict,evaluate} ...
- Prepare the data for training.
nor-cleaner prepare
- Train the model.
nor-cleaner train
- Predict whether to preserve or drop a record.
nor-cleaner predict
- (Optional) Evaluate the model using cross-validation on the training data.
nor-cleaner evaluate
NOTE: use the constants file in scripts/constants.py
to set paths.