Encoder module¶
- class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')¶
Bases:
Module
Encoder block from Attention is All You Need.
Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Swict between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.
- property attention_map: Tensor¶
Attention map after a forward propagation, variable score in the original paper.
- forward(x)¶
Propagate the input through the Encoder block.
Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.