Decoder module¶
- class decoder.Decoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')¶
Bases:
Module
Decoder block from Attention is All You Need.
Apply two Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Swict between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.
- forward(x, memory)¶
Propagate the input through the Decoder block.
Apply the self attention block, add residual and normalize. Apply the encoder-decoder attention block, add residual and normalize. Apply the feed forward network, add residual and normalize.