Transformer module¶

class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)¶

Bases: Module

Transformer model from Attention is All You Need.

A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.

Variables

layers_encoding (list of Encoder.Encoder) – stack of Encoder layers.
layers_decoding (list of Decoder.Decoder) – stack of Decoder layers.

Parameters

d_input (int) – Model input dimension.
d_model (int) – Dimension of the input vector.
d_output (int) – Model output dimension.
q (int) – Dimension of queries and keys.
v (int) – Dimension of values.
h (int) – Number of heads.
N (int) – Number of encoder and decoder layers to stack.
attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.
dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.
chunk_mode (str) – Switch between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.
pe (Optional[str]) – Type of positional encoding to add. Must be one of 'original', 'regular' or None. Default is None.
pe_period (Optional[int]) – If using the 'regular'` pe, then we can define the period. Default is ``None.

forward(x)¶

Propagate input through transformer

Forward input through an embedding module, the encoder then decoder stacks, and an output module.

Parameters: x (Tensor) – torch.Tensor of shape (batch_size, K, d_input).
Return type: Tensor
Returns: Output tensor with shape (batch_size, K, d_output).