This repository contains code for bidirectional sequence generation (BiSon).
Results have been published in (please cite if you use this repository):
Carolin Lawrence, Bhushan Kotnis, and Mathias Niepert. 2019. Attending to Future Tokens For Bidirectional Sequence Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China.
Section | Description |
---|---|
Installation | How to install the package |
Overview | Overview of the package |
Implementing a new dataset | |
General Notes | Additional useful information |
External libraries are: numpy, torch>=0.4.1, tqdm, boto3, requests, regex
This repository is compatible with the following three projects, which are required to reproduce the results reported in the paper:
- Huggingface's BERT models:
Should be compatible with any instance of the
pyotch-pretrained-bert
version, results from the paper have been produced using a fork of commit hash a5b3a89545bfce466dd977a9c6a7b15554b193b1 - BLEU Script:
To get BLEU evaluation scores, the script needs to be downloaded and placed in
bison/evals/multi-bleu.perl
and have execution rights (e.g.chmod u+x bison/evals/multi-bleu.perl
) - Sharc Evaluation Script:
The official sharc evaluation script needs to be downloaded and placed in
bison/evals/evaluator_sharc.py
. It can be found in the codalab of the Sharc dataset: https://worksheets.codalab.org/worksheets/0xcd87fe339fa2493aac9396a3a27bbae8/, search for "evaluator.py".
-
Example files to call BiSon for either training or prediction for two datasets (ShARC and Daily Dialog) can be found in
example_files
. Be sure to adjust the variableREPO_DIR
to point to the path of your repository. -
BiSon specific implementations:
arguments.py
: Specifies all possible arguments for both BiSon:GeneralArguments
: General settings.BisonArguments
: BiSon specific settings.
bison_handler.py
: Calls all necessary functions for BiSon training and prediction.masking.py
: Handles the masking procedure. Get a masker by callingget_masker
and passingBisonArguments
. Currently one masker is implemented:GenerationMasking.py
: Places masks in Part B, where Part A is conditioning input and Part B will be just placeholder tokens ([MASK])) at prediction time. Masks can either be placed using a Bernoulli distribution (--masking_strategy bernoulli
) with a specified mean (--distribution_mean
) or using a Gaussian distribution (--masking_strategy gaussian
) with a specified mean (--distribution_mean
) and standard deviation (--distribution_stdev
)
model_helper.py
: Sets up some general BiSon settings.predict.py
: Handles BiSon prediction.train.py
: Handles BiSon training.util.py
: Some utility function, e.g. for reading and writing files.
-
Several implemented datasets. Get a data handler by calling
get_data_handler
fromdatasets_factory.py
and passingBisonArguments
.The general class that all other datasets should inherit from:
datasets_bitext.py
: Implements all necessary functions a data handler should have. It assumes a tab separate files as input, where everything prior to the tab becomes Part A and everything after the tab becomes Part B. At prediction time, BiSon aims to predict Part B.
Dialogue datasets:
datasets_sharc.py
: Implements the ShARC dataset.datasets_daily.py
: Implements the Daily Dialog dataset.
-
main python file:
run_bison.py
: Main entry point for any BiSon training and prediction.
To implement a new dataset, ensure that it inherits from BitextHandler
.
See the documentation of each function and determine if your dataset needs to overwrite this functionality or not.
-
The learning rate depends on the number of epochs, see warmup_linear function in
optimizer.py
: At the last update step the learning rate is 0. If a run finishes with its highest score in the last epoch, increasing the epoch counter does not necessarily help because it completely modifies the learning rate. -
Training cannot simply be restarted from a saved model because Adam's parameters are not saved.
-
When using the parameter --gradient_accumulation_steps, the value for batch_size should be the truly desired batch size, e.g. we want a batch size of 16 but only 6 examples fit into GPU RAM, then:
--train_batch_size 16 --gradient_accumulation_steps 3