Below is a list of important concepts in neural networks for NLP. In the annotations/
directory in this repository,
we have examples of papers annotated with these concepts that you can peruse.
Annotation Critera: For a particular paper, the concept should be annotated if it is important to understand the proposed method. It should also be annotated if it's important to understand the evaluation. For example, if a proposed self-attention model is compared to a baseline that uses an LSTM, and the difference between these two methods is important to understanding the experimental results, then the LSTM concept should also be annotated. Concepts do not need to be annotated if they are simply mentioned in passing, or in the related work section.
Implication: Some tags are listed with "XXX
(implies YYY
)" which means you need to understand a particular
concept XXX
in order to understand concept YYY
. If YYY
exists in a paper, you do not need to annotate XXX
.
Non-neural Papers: This conceptual hierarchy is for tagging papers that are about neural network models for NLP.
If a paper is not fundamentally about some application of neural networks to NLP, it should be tagged with not-neural
,
and no other tags need to be applied.
- Mini-batch SGD:
optim-sgd
- Adam:
optim-adam
(impliesoptim-sgd
) - Adagrad:
optim-adagrad
(impliesoptim-sgd
) - Adadelta:
optim-adadelta
(impliesoptim-sgd
) - Adam with Specialized Transformer Learning Rate ("Noam" Schedule):
optim-noam
(impliesoptim-adam
) - SGD with Momentum:
optim-momentum
(impliesoptim-sgd
) - AMS:
optim-amsgrad
(impliesoptim-sgd
) - Projection / Projected Gradient Descent:
optim-projection
(impliesoptim-sgd
)
- Glorot/Xavier Initialization:
init-glorot
- He Initialization:
init-he
- Dropout:
reg-dropout
- Word Dropout:
reg-worddropout
(impliesreg-dropout
) - Norm (L1/L2) Regularization:
reg-norm
- Early Stopping:
reg-stopping
- Patience:
reg-patience
(impliesreg-stopping
) - Weight Decay:
reg-decay
- Label Smoothing:
reg-labelsmooth
- Layer Normalization:
norm-layer
- Batch Normalization:
norm-batch
- Gradient Clipping:
norm-gradient
- Canonical Correlation Analysis (CCA):
loss-cca
- Singular Value Decomposition (SVD):
loss-svd
- Margin-based Loss Functions:
loss-margin
- Contrastive Loss:
loss-cons
- Noise Contrastive Estimation (NCE):
loss-nce
(impliesloss-cons
) - Triplet Loss:
loss-triplet
(impliesloss-cons
)
- Multi-task Learning (MTL):
train-mtl
- Multi-lingual Learning (MLL):
train-mll
(impliestrain-mtl
) - Transfer Learning:
train-transfer
- Active Learning:
train-active
- Data Augmentation:
train-augment
- Curriculum Learning:
train-curriculum
- Parallel Training:
train-parallel
- Hyperbolic Tangent (tanh):
activ-tanh
- Rectified Linear Units (RelU):
activ-relu
- Recurrent Neural Network (RNN):
arch-rnn
- Bi-directional Recurrent Neural Network (Bi-RNN):
arch-birnn
(impliesarch-rnn
) - Long Short-term Memory (LSTM):
arch-lstm
(impliesarch-rnn
) - Bi-directional Long Short-term Memory (LSTM):
arch-bilstm
(impliesarch-birnn
,arch-lstm
) - Gated Recurrent Units (GRU):
arch-gru
(impliesarch-rnn
) - Bi-directional Gated Recurrent Units (GRU):
arch-bigru
(impliesarch-birnn
,arch-gru
)
- Bag-of-words, Bag-of-embeddings, Continuous Bag-of-words (BOW):
arch-bow
- Convolutional Neural Networks (CNN):
arch-cnn
- Attention:
arch-att
- Self Attention:
arch-selfatt
(impliesarch-att
) - Recursive Neural Network (RecNN):
arch-recnn
- Tree-structured Long Short-term Memory (TreeLSTM):
arch-treelstm
(impliesarch-recnn
) - Graph Neural Network (GNN):
arch-gnn
- Graph Convolutional Neural Network (GCNN):
arch-gcnn
(impliesarch-gnn
)
- Residual Connections (ResNet):
arch-residual
- Gating Connections, Highway Connections:
arch-gating
- Memory:
arch-memo
- Copy Mechanism:
arch-copy
- Bilinear, Biaffine Models:
arch-bilinear
- Coverage Vectors/Penalties:
arch-coverage
- Subword Units:
arch-subword
- Energy-based, Globally-normalized Mdels:
arch-energy
- Transformer:
arch-transformer
(impliesarch-selfatt
,arch-residual
,arch-layernorm
,optim-noam
)
- Ensembling:
comb-ensemble
- Greedy Search:
search-greedy
- Beam Search:
search-beam
- A* Search:
search-astar
- Viterbi Algorithm:
search-viterbi
- Ancestral Sampling:
search-sampling
- Gumbel Max:
search-gumbel
(impliessearch-sampling
)
- Text Classification (text -> label):
task-textclass
- Text Pair Classification (two texts -> label:
task-textpair
- Sequence Labeling (text -> one label per token):
task-seqlab
- Extractive Summarization (text -> subset of text):
task-extractive
(impliestext-seqlab
) - Span Labeling (text -> labels on spans):
task-spanlab
- Language Modeling (predict probability of text):
task-lm
- Conditioned Language Modeling (some input -> text):
task-condlm
(impliestask-lm
) - Sequence-to-sequence Tasks (text -> text, including MT):
task-seq2seq
(impliestask-condlm
) - Cloze-style Prediction, Masked Language Modeling (right and left context -> word):
task-cloze
- Context Prediction (as in word2vec) (word -> right and left context):
task-context
- Relation Prediction (text -> graph of relations between words, including dependency parsing):
task-relation
- Tree Prediction (text -> tree, including syntactic and some semantic semantic parsing):
task-tree
- Graph Prediction (text -> graph not necessarily between nodes):
task-graph
- Lexicon Induction/Embedding Alignment (text/embeddings -> bi- or multi-lingual lexicon):
task-lexicon
- Word Alignment (parallel text -> alignment between words):
task-alignment
- word2vec:
pre-word2vec
(impliesarch-cbow
,task-cloze
,task-context
) - fasttext:
pre-fasttext
(impliesarch-cbow
,arch-subword
,task-cloze
,task-context
) - GloVe:
pre-glove
- Paragraph Vector (ParaVec):
pre-paravec
- Skip-thought:
pre-skipthought
(impliesarch-lstm
,task-seq2seq
) - ELMo:
pre-elmo
(impliesarch-bilstm
,task-lm
) - BERT:
pre-bert
(impliesarch-transformer
,task-cloze
,task-textpair
) - Universal Sentence Encoder (USE):
pre-use
(impliesarch-transformer
,task-seq2seq
)
- Hidden Markov Models (HMM):
struct-hmm
- Conditional Random Fields (CRF):
struct-crf
- Context-free Grammar (CFG):
struct-cfg
- Combinatorial Categorical Grammar (CCG):
struct-ccg
- Complete Enumeration:
nondif-enum
- Straight-through Estimator:
nondif-straightthrough
- Gumbel Softmax:
nondif-gumbelsoftmax
- Minimum Risk Training:
nondif-minrisk
- REINFORCE:
nondif-reinforce
- Generative Adversarial Networks (GAN):
adv-gan
- Adversarial Feature Learning:
adv-feat
- Adversarial Examples:
adv-examp
- Adversarial Training:
adv-train
(impliesadv-examp
)
- Variational Auto-encoder (VAE):
latent-vae
- Topic Model:
latent-topic
- Meta-learning Initialization:
meta-init
- Meta-learning Optimizers:
meta-optim
- Meta-learning Loss functions:
meta-loss
- Neural Architecture Search:
meta-arch