Encoder module

class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')

Bases: Module

Encoder block from Attention is All You Need.

Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

property attention_map: Tensor

Attention map after a forward propagation, variable score in the original paper.

Return type

Tensor

forward(x)

Propagate the input through the Encoder block.

Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.

Parameters

x (Tensor) – Input tensor with shape (batch_size, K, d_model).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_model).