Transformer module

class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)

Bases: Module

Transformer model from Attention is All You Need.

A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.

Variables
  • layers_encoding (list of Encoder.Encoder) – stack of Encoder layers.

  • layers_decoding (list of Decoder.Decoder) – stack of Decoder layers.

Parameters
  • d_input (int) – Model input dimension.

  • d_model (int) – Dimension of the input vector.

  • d_output (int) – Model output dimension.

  • q (int) – Dimension of queries and keys.

  • v (int) – Dimension of values.

  • h (int) – Number of heads.

  • N (int) – Number of encoder and decoder layers to stack.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Switch between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

  • pe (Optional[str]) – Type of positional encoding to add. Must be one of 'original', 'regular' or None. Default is None.

  • pe_period (Optional[int]) – If using the 'regular'` pe, then we can define the period. Default is ``None.

forward(x)

Propagate input through transformer

Forward input through an embedding module, the encoder then decoder stacks, and an output module.

Parameters

x (Tensor) – torch.Tensor of shape (batch_size, K, d_input).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_output).