Official code for the paper:
Keyphrase Generation: Lessons from a Reproducibility Study
Edwin Thomas and Sowmya Vajjala
LREC-COLING 2024
The models used in the reproducibility study are heavily adapted from their respective original implementations:
For Significance testing, the following repository was used: https://github.com/rtmdrr/testSignificanceNLP
Perform Tokenization according to Meng. et. al Deep Keyphrase Generation paper KPG-OpenNMT-py
Refer to UniKP/OpenNMT-kpg-release/notebook/json_process.ipynb
for the tokenization step.
For final pre-processing of tokenized json files run the following script from UniKP/UniKeyphrase/preprocess/start_make.sh
:
./start_make.sh
Navigate to UniKP/UniKeyphrase/scripts
folder and run the following scripts:
# for PyTorch DDP based training
./start_train_ddp.sh
./start_test.sh
./metrics.sh
Navigate to KPDrop/scripts
folder and run the following scripts:
./train.sh
./test.sh
Navigate to SignificanceTesting/testSignificanceNLP
and run automation.sh
with arguments for file A and file B should be changed to the generated metric files from previous benchmark steps. Please refer to data_precessing.ipynb
for an example.