Decoder module

class decoder.Decoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')

Bases: Module

Decoder block from Attention is All You Need.

Apply two Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

forward(x, memory)

Propagate the input through the Decoder block.

Apply the self attention block, add residual and normalize. Apply the encoder-decoder attention block, add residual and normalize. Apply the feed forward network, add residual and normalize.

Parameters
  • x (Tensor) – Input tensor with shape (batch_size, K, d_model).

  • memory (Tensor) – Memory tensor with shape (batch_size, K, d_model) from encoder output.

Return type

Tensor

Returns

x – Output tensor with shape (batch_size, K, d_model).