Encoder module¶

class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')¶

Encoder block from Attention is All You Need.

Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters

d_model (int) – Dimension of the input vector.
q (int) – Dimension of all query matrix.
v (int) – Dimension of all value matrix.
h (int) – Number of heads.
attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.
dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.
chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

property attention_map: Tensor¶: Attention map after a forward propagation, variable score in the original paper.

forward(x)¶

Propagate the input through the Encoder block.

Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.

Parameters: x (Tensor) – Input tensor with shape (batch_size, K, d_model).
Return type: Tensor
Returns: Output tensor with shape (batch_size, K, d_model).