Skip to content

Commit

Permalink
clean up mapping/evaluation
Browse files Browse the repository at this point in the history
  • Loading branch information
kellymarchisio committed Mar 3, 2023
1 parent aae4008 commit 78c9935
Show file tree
Hide file tree
Showing 4 changed files with 92 additions and 54 deletions.
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Requirements
- sklearn
- scipy
- numpy
- indic-nlp-library
- indic-nlp-library
- torchtext
--------

Expand All @@ -36,14 +36,21 @@ Usage
-------
To reproduce Table 1 in the paper (Baselines), run:
- `sh baseline.sh $system $lang $seed`
* For instance, run `sh baseline.sh w2v uk 0` for offical word2vec trained on Ukrainian.
* For instance, run `sh baseline.sh w2v uk` for offical word2vec trained on Ukrainian.
* system choices: {isovec, w2v}
* lang choices: {uk, bn, ta, en}
- Here is an example experiment for running Isovec:
- After you train English and Ukrainian baseline w2v spaces, for instance, you
can map them and evaluate the dictionary precision with:
`sh map-and-eval.sh baseline w2v uk en dev`
* Results will be in `exps/baseline/w2v/uk-en/*out`

- Here is an example experiment for running Isovec in reference to a fixed
English embedding space:
* Goal: Train a Ukrainian embedding space with RSIM-U, in reference to a fixed English space.
* Step 1: Train the fixed English space with `sh baseline.sh isovec uk 0`
* Step 2: Train the Ukrainian space with: `sh run-isovec.sh rsim-u uk en 0`
* Step 1: Train the fixed English space with `sh baseline.sh isovec en`
* Step 2: Train the Ukrainian space with: `sh run-isovec.sh rsim-u uk en`
* Step 3: Map & Evaluate the spaces with: `sh map-and-eval.sh isovec rsim-u uk en dev`
- Choices of Isovec training algorithm are `l2, proc-l2, proc-l2-init, rsim,
rsim-init, rsim-u, evs-u` for L2, Proc-L2, Proc-L2+Init, RSIM, RSIM-U, and
EVS-U as detailed in Section 4.3 and 4.4 of the paper.
EVS-U as detailed in Section 4.3 and 4.4 of the paper.

68 changes: 68 additions & 0 deletions map-and-eval.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/bin/bash -v
. ./local-settings.sh

MODE=$1
STAGE=$2
SRC=$3
TRG=$4
EVAL=$5
SEEDS=$DIR/data/dicts/$SRC-$TRG/train/$SRC-$TRG.0-5000.txt

# Choose dev/test set.
if [[ $EVAL == dev ]]; then
TEST=data/dicts/$SRC-$TRG/dev/$SRC-$TRG.6501-8000.txt
elif [[ $EVAL == test ]]; then
TEST=data/dicts/$SRC-$TRG/test/$SRC-$TRG.5000-6500.txt
else
echo Please specify "test" to eval on test set, or "dev" to eval on dev set.
exit
fi

# Set up mapping directory and point to correct embeddings.
if [[ $MODE == baseline ]]; then
if ! ([[ $STAGE == w2v ]] || [[ $STAGE == isovec ]]); then
echo Stage must be w2v or isovec for baseline evaluation.
exit
fi
BASEDIR=$DIR/exps/baseline/$STAGE # isovec or w2v
SRC_EMBS=$BASEDIR/$SRC/embs.out
TRG_EMBS=$BASEDIR/$TRG/embs.out
MAPPED_OUTDIR=$BASEDIR/$SRC-$TRG/$EVAL
elif [[ $MODE == isovec ]]; then
if ! ([[ $STAGE == l2 ]] || \
[[ $STAGE == proc-l2 ]] || [[ $STAGE == proc-l2-init ]] || \
[[ $STAGE == rsim ]] || [[ $STAGE == rsim-init ]] || \
[[ $STAGE == rsim-u ]] || [[ $STAGE == evs-u ]]); then
echo Please specify a correct stage for isovec
exit
fi
BASEDIR=$DIR/exps/isovec/$STAGE
SRC_EMBS=$BASEDIR/$SRC-$TRG/embs.out
TRG_EMBS=$DIR/exps/baseline/isovec/$TRG/embs.out # Reference embeddings
MAPPED_OUTDIR=$BASEDIR/$SRC-$TRG/$EVAL
else
echo Please specify "isovec" or "baseline" for mode.
exit
fi

echo Source Embeddings $SRC_EMBS
echo Reference Embeddings $TRG_EMBS
echo Output directory: $MAPPED_OUTDIR

# Perform the VecMap mapping and eval.
mkdir -p $MAPPED_OUTDIR
set -x
for mode in sup semisup unsup
do
echo Mapping embeddings with ref embeddings in $mode mode...
time sh map.sh -s $SRC_EMBS -t $TRG_EMBS \
-u $MAPPED_OUTDIR/embs.out.to$TRG.mapped.$mode \
-v $MAPPED_OUTDIR/$TRG.mapped.$mode -m $mode -d $SEEDS \
> $MAPPED_OUTDIR/$mode.map-eval.out
echo Evaluating mapped embeddings...
time sh eval.sh -s $MAPPED_OUTDIR/embs.out.to$TRG.mapped.$mode \
-t $MAPPED_OUTDIR/$TRG.mapped.$mode -d $TEST \
>> $MAPPED_OUTDIR/$mode.map-eval.out
done
echo Done.

36 changes: 0 additions & 36 deletions map-baseline.sh

This file was deleted.

23 changes: 11 additions & 12 deletions run-isovec.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,15 @@
#
# Combining Skipgram & Isomorphism Losses
# by Kelly Marchisio.
#
#
###############################################################################

stage=$1
STAGE=$1
LNG=$2
REF_LNG=$3
trial_num=$4

EXP_NAME=isovec
OUTDIR=exps/$EXP_NAME/$stage/$trial_num/$LNG-$REF_LNG
OUTDIR=exps/$EXP_NAME/$STAGE/$LNG-$REF_LNG
SEEDS=data/dicts/$LNG-$REF_LNG/train/$LNG-$REF_LNG.0-5000.txt
TEST=data/dicts/$LNG-$REF_LNG/dev/$LNG-$REF_LNG.6501-8000.txt
MAPPED_OUTDIR=$OUTDIR/mapped
Expand All @@ -30,7 +29,7 @@ WARMUP=0.25
WARMUP_TYPE=percent
STARTING_ALPHA=0.001
INFILE=data/news.2020.$LNG.tok.1M
REF_EMBS=$DIR/exps/baseline/isovec0/$REF_LNG/embs.out
REF_EMBS=$DIR/exps/baseline/isovec/$REF_LNG/embs.out
RAND_SEED=0 # To match with en space.
LOSS=wass
MODE=supervised
Expand All @@ -43,34 +42,34 @@ INIT_EMBS_W_REFS=0
GH_N=10000
MAX_SEEDS=-1 # All.

if [ $stage == l2 ]; then
if [ $STAGE == l2 ]; then
MIXED_LOSS_START_BATCH=0
BETA=0.1
elif [ $stage == proc-l2 ]; then
elif [ $STAGE == proc-l2 ]; then
LOSS=procwass
MIXED_LOSS_START_BATCH=0
BETA=0.333
elif [ $stage == proc-l2-init ]; then
elif [ $STAGE == proc-l2-init ]; then
LOSS=procwass
MIXED_LOSS_START_BATCH=0
BETA=0.2
INIT_EMBS_W_REFS=1
elif [ $stage == rsim ]; then
elif [ $STAGE == rsim ]; then
LOSS=rs
MIXED_LOSS_START_BATCH=0
BETA=0.01
elif [ $stage == rsim-init ]; then
elif [ $STAGE == rsim-init ]; then
LOSS=rs
MIXED_LOSS_START_BATCH=0
BETA=0.001
INIT_EMBS_W_REFS=1
elif [ $stage == rsim-u ]; then
elif [ $STAGE == rsim-u ]; then
LOSS=rs
MIXED_LOSS_START_BATCH=0
BETA=0.1
MODE=unsupervised
GH_N=2000
elif [ $stage == evs-u ]; then
elif [ $STAGE == evs-u ]; then
LOSS=evs
MIXED_LOSS_START_BATCH=0
BETA=0.333
Expand Down

0 comments on commit 78c9935

Please sign in to comment.