Skip to content
This repository has been archived by the owner on Aug 12, 2024. It is now read-only.

Commit

Permalink
attention drawing
Browse files Browse the repository at this point in the history
  • Loading branch information
Lisa committed Oct 12, 2020
1 parent b9f26ec commit 670dbb5
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 0 deletions.
Binary file added SMDL/Day5/lessons/task1/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions SMDL/Day5/lessons/task1/task-info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ files:
visible: true
- name: attention.py
visible: true
- name: architecture.png
visible: true
9 changes: 9 additions & 0 deletions SMDL/Day5/lessons/task1/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [b
- Therefore, make sure that your final RNN layer returns sequences.
- If using the classic `Attention` layer, the resulting context vector needs to be collapsed back into a single sequence dimension (using `tf.reduce.sum` or `tf.reduce.mean` depending on your domain), before concatenating it with the target. This will ensure that the sequence dimensions are the same.

### Application in Encoder-Decoder

![architecture](architecture.png)

- The Attention layer is applied at the Decoder.
- The Encoder is changed to return sequences. The sequences will be used as the value (and key) to the Attention layer.
- The Decoder's hidden state is the query. The initial hidden state will be set to the Encoder's output hidden state (same as vanilla Seq2Seq Encoder-Decoder)
- The Attention layer produces a context vector, which is a dynamically weighted combination of the Encoded sequence, summed together.
- The context vector is passed along to the LSTM or GRU (concatenated with the previous target token's embedding). Previously, in a vanilla Seq2Seq, the last encoded output was provided. *This way, the context vector can hold richer information that takes into account the **entire** Encoder sequence, rather than just the final value of the Encoder sequence.*

Further Enhancements:
* Neural Machine Translation with (Bahdanau) attention: https://www.tensorflow.org/tutorials/text/nmt_with_attention
Expand Down

0 comments on commit 670dbb5

Please sign in to comment.