Encoder module

class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')

Bases: Module

Encoder block from Attention is All You Need.

Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

property attention_map: Tensor

Attention map after a forward propagation, variable score in the original paper.

forward(x)

Propagate the input through the Encoder block.

Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.

Parameters

x (Tensor) – Input tensor with shape (batch_size, K, d_model).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_model).