Transformer module¶
- class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)¶
Bases:
Module
Transformer model from Attention is All You Need.
A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.
- Variables
- Parameters
d_input (
int
) – Model input dimension.d_model (
int
) – Dimension of the input vector.d_output (
int
) – Model output dimension.q (
int
) – Dimension of queries and keys.v (
int
) – Dimension of values.h (
int
) – Number of heads.N (
int
) – Number of encoder and decoder layers to stack.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Switch between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.pe (
Optional
[str
]) – Type of positional encoding to add. Must be one of'original'
,'regular'
orNone
. Default isNone
.pe_period (
Optional
[int
]) – If using the'regular'` pe, then we can define the period. Default is ``None
.
- forward(x)¶
Propagate input through transformer
Forward input through an embedding module, the encoder then decoder stacks, and an output module.
- Parameters
x (
Tensor
) –torch.Tensor
of shape (batch_size, K, d_input).- Return type
- Returns
Output tensor with shape (batch_size, K, d_output).