Bahdanau Attention

UpdatedDecember 4, 2024

•1 min read•View as Markdown

Bahdanau Attention

Part of seriesAttention Mechanism

The paper Neural Machine Translation by Jointly Learning to Align and Translate formally introduced the concept of attention for the first time.

One of the illustrations (shown below) presented in the paper Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation is quite interesting in this context.

What if context vector C is computed by taking all the hidden states of the encoder?
What if context vector C is computed for each decoding step differently?

Bahdanau Attention

Bahdanau Attention was proposed for Neural Machine Translation task using Bi-directional RNN encoder and RNN decoder.

In each decoding step,

hidden state of decoder (query) is used to attend to all hidden states of the encoder (keys) to compute attention score for each hidden state

$$\begin{align} h_j &= Concat(\overrightarrow{h_j}, \overleftarrow{h_j}) \\ e_{t,j} &= MLP(s_{t-1}, h_j) \\ \alpha_{t,j} &= \frac{exp(e_{t,j})}{\sum_{k=1}^{T}{exp(e_{t,k})}} \\ \end{align}$$

context vector is then computed as weighted sum of hidden states of the encoder (values) with attention scores as the weights

$$c_{t} = \sum_{j=1}^{T}\alpha_{t,j}h_j$$

The attended context vector along with the decoder hidden state is then used to predict the next token.

Here is an interesting thread by Andrej Karpathy on how Bahdanau ended up building this attention mechanism.

https://twitter.com/karpathy/status/1864023344435380613

#attention-mechanism

Comments

Join the discussion

No comments yet. Be the first to comment.

Attention Mechanism

Part 1 of 3

Hard, Soft and Doubly Stochastic Attention

For the image captioning task, the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention proposes three different attention mechanisms, Stochastic Hard Attention Deterministic Soft Attention Doubly Stochastic Attention...

More from this blog

DIN: Deep Interest Network for Click-Through Rate Prediction

Learning Target Item Aware User Interest Representation for CTR Prediction

Aug 23, 20253 min read

DIN: Deep Interest Network for Click-Through Rate Prediction

AttriBERT - Session-based Product Attribute Recommendation with BERT

Personalized Refinement or Filter Recommendation

Dec 27, 20244 min read

AttriBERT - Session-based Product Attribute Recommendation with BERT

Interpretable Attribute-based Action-aware Bandits for Within-Session Personalization in E-commerce

Authors propose Online Personalized Attribute-based Re-ranker (OPAR), a light-weight, within-session personalization approach using multi-arm bandits (MAB). Each arm of the MAB represents an attribute. As users interact with products, the bandit lear...

Dec 3, 20243 min read

Interpretable Attribute-based Action-aware Bandits for Within-Session Personalization in E-commerce

Luong Attention

Luong Attention is proposed in the paper Effective Approaches to Attention-based Neural Machine Translation with two variants Global Attention Local Attention The proposed NMT model architecture consists of LSTM encoder to encode input sequences ...

Dec 1, 20242 min read

Luong Attention

Hard, Soft and Doubly Stochastic Attention

For the image captioning task, the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention proposes three different attention mechanisms, Stochastic Hard Attention Deterministic Soft Attention Doubly Stochastic Attention...

Nov 30, 20242 min read

Hard, Soft and Doubly Stochastic Attention

RecSys et al.

11 posts