data
contains the data used for our code. The zip files are training/testing data which contain numerous csv files inside. The npy files are numpy pickle files which can be loaded as follows:
import numpy as np
map = numpy.load("data/some_file.npy", allow_pickle=True)
figures
contains the graphs shown in the report generated by our code.
jobs
contains baseline Greatlakes job scripts to run.
models
is where our trained models are saved locally.
results
contains the json files with the state of the trained models at the end of execution to be analyzed.
src
contains the source code for our scripts.
src/notebooks
contains the notebooks used for data generation and DeepMoji experiments.
All scripts must be run from the root of the project.
python src/bert.py <dataset name> <debug> <mapping filename>
Dataset name: dev | train | test
Debug: true | false
Mapping filename: Simply the filename of the mapping file to use. Ex: mapping.npy or foo_bar.p. This file must exist within the data folder
Example:
python src/bert.py train false mapping.npy
python src/dist.py
python src/gen_cluster_mapping.py
python src/infer.py <model name> <mapping filename> <dataset name*> <number of samples*> <number of scores*>
Model name: Name of the folder containing the model. The model must be located in the model folder
Mapping filename: Filename of the mapping file. The mapping file must be located in the data folder
Dataset name: Name of the dataset file to use. Defaults to "test"
Number of samples: number of samples you wanted printed out. Defaults to 1
Number of scores: number of predictions to display. Defaults to 5
Examples:
python src/infer.py train-trained-bert mapping.npy
python src/infer.py some-model-name cluster_mapping.npy dev 3 10
python src/process_results.py <path to result file> <name of results>
Path to result file: Full path to json file containing final training state
Name of results: name for the output files and titles
Example:
python src/process_results.py results/some_file.json "Some results"