Transformer module¶
- class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)¶
Bases:
ModuleTransformer model from Attention is All You Need.
A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.
- Variables
- Parameters
d_input (
int) – Model input dimension.d_model (
int) – Dimension of the input vector.d_output (
int) – Model output dimension.q (
int) – Dimension of queries and keys.v (
int) – Dimension of values.h (
int) – Number of heads.N (
int) – Number of encoder and decoder layers to stack.attention_size (
Optional[int]) – Number of backward elements to apply attention. Deactivated ifNone. Default isNone.dropout (
float) – Dropout probability after each MHA or PFF block. Default is0.3.chunk_mode (
str) – Switch between different MultiHeadAttention blocks. One of'chunk','window'orNone. Default is'chunk'.pe (
Optional[str]) – Type of positional encoding to add. Must be one of'original','regular'orNone. Default isNone.pe_period (
Optional[int]) – If using the'regular'` pe, then we can define the period. Default is ``None.
- forward(x)¶
Propagate input through transformer
Forward input through an embedding module, the encoder then decoder stacks, and an output module.
- Parameters
x (
Tensor) –torch.Tensorof shape (batch_size, K, d_input).- Return type
- Returns
Output tensor with shape (batch_size, K, d_output).