Transformer for metamodels

Transformers for Time Series

https://readthedocs.org/projects/timeseriestransformer/badge/?version=latestDocumentation Status https://img.shields.io/badge/License-GPLv3-blue.svgLicense: GPL v3 https://img.shields.io/github/release/maxjcohen/transformer.svgLatest release

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series (Powered by PyTorch).

Transformer model

Transformer are attention based neural networks designed to solve NLP tasks. Their key features are:

  • linear complexity in the dimension of the feature vector ;

  • paralellisation of computing of a sequence, as opposed to sequential computing ;

  • long term memory, as we can look at any input time sequence step directly.

This repo will focus on their application to times series.

Dataset and application as metamodel

Our use-case is modeling a numerical simulator for building consumption prediction. To this end, we created a dataset by sampling random inputs (building characteristics and usage, weather, …) and got simulated outputs. We then convert these variables in time series format, and feed it to the transformer.

Adaptations for time series

In order to perform well on time series, a few adjustments had to be made:

  • The embedding layer is replaced by a generic linear layer ;

  • Original positional encoding are removed. A “regular” version, better matching the input sequence day/night patterns, can be used instead ;

  • A window is applied on the attention map to limit backward attention, and focus on short term patterns.

Installation

All required packages can be found in requirements.txt, and expect to be run with python3.7. Note that you may have to install pytorch manually if you are not using pip with a Debian distribution : head on to PyTorch installation page. Here are a few lines to get started with pip and virtualenv:

$ apt-get install python3.7
$ pip3 install --upgrade --user pip virtualenv
$ virtualenv -p python3.7 .env
$ . .env/bin/activate
(.env) $ pip install -r requirements.txt

Usage

Downloading the dataset

The dataset is not included in this repo, and must be downloaded manually. It is comprised of two files, dataset.npz contains all input and outputs value, labels.json is a detailed list of the variables. Please refer to #2 for more information.

Running training script

Using jupyter, run the default training.ipynb notebook. All adjustable parameters can be found in the second cell. Careful with the BATCH_SIZE, as we are using it to parallelize head and time chunk calculations.

Outside usage

The Transformer class can be used out of the box, see the docs for more info.

from tst import Transformer

net = Transformer(d_input, d_model, d_output, q, v, h, N, TIME_CHUNK, pe)

Building the docs

To build the doc:

(.env) $ cd docs && make html

Modules

Transformer module

class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)

Bases: Module

Transformer model from Attention is All You Need.

A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.

Variables
  • layers_encoding (list of Encoder.Encoder) – stack of Encoder layers.

  • layers_decoding (list of Decoder.Decoder) – stack of Decoder layers.

Parameters
  • d_input (int) – Model input dimension.

  • d_model (int) – Dimension of the input vector.

  • d_output (int) – Model output dimension.

  • q (int) – Dimension of queries and keys.

  • v (int) – Dimension of values.

  • h (int) – Number of heads.

  • N (int) – Number of encoder and decoder layers to stack.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Switch between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

  • pe (Optional[str]) – Type of positional encoding to add. Must be one of 'original', 'regular' or None. Default is None.

  • pe_period (Optional[int]) – If using the 'regular'` pe, then we can define the period. Default is ``None.

forward(x)

Propagate input through transformer

Forward input through an embedding module, the encoder then decoder stacks, and an output module.

Parameters

x (Tensor) – torch.Tensor of shape (batch_size, K, d_input).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_output).

Encoder module

class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')

Bases: Module

Encoder block from Attention is All You Need.

Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

property attention_map: Tensor

Attention map after a forward propagation, variable score in the original paper.

Return type

Tensor

forward(x)

Propagate the input through the Encoder block.

Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.

Parameters

x (Tensor) – Input tensor with shape (batch_size, K, d_model).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_model).

Decoder module

class decoder.Decoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')

Bases: Module

Decoder block from Attention is All You Need.

Apply two Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • dropout (float) – Dropout probability after each MHA or PFF block. Default is 0.3.

  • chunk_mode (str) – Swict between different MultiHeadAttention blocks. One of 'chunk', 'window' or None. Default is 'chunk'.

forward(x, memory)

Propagate the input through the Decoder block.

Apply the self attention block, add residual and normalize. Apply the encoder-decoder attention block, add residual and normalize. Apply the feed forward network, add residual and normalize.

Parameters
  • x (Tensor) – Input tensor with shape (batch_size, K, d_model).

  • memory (Tensor) – Memory tensor with shape (batch_size, K, d_model) from encoder output.

Return type

Tensor

Returns

x – Output tensor with shape (batch_size, K, d_model).

MultiHeadAttention module

class multiHeadAttention.MultiHeadAttention(d_model, q, v, h, attention_size=None)

Bases: Module

Multi Head Attention block from Attention is All You Need.

Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model).

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

property attention_map: Tensor

Attention map after a forward propagation, variable score in the original paper.

Return type

Tensor

forward(query, key, value, mask=None)

Propagate forward the input through the MHB.

We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).

Parameters
  • query (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute queries.

  • key (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute keys.

  • value (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute values.

  • mask (Optional[str]) – Mask to apply on scores before computing attention. One of 'subsequent', None. Default is None.

Return type

Tensor

Returns

Self attention tensor with shape (batch_size, K, d_model).

class multiHeadAttention.MultiHeadAttentionChunk(d_model, q, v, h, attention_size=None, chunk_size=168, **kwargs)

Bases: MultiHeadAttention

Multi Head Attention block with chunk.

Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model). Queries, keys and values are divided in chunks of constant size.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • chunk_size (Optional[int]) – Size of chunks to apply attention on. Last one may be smaller (see torch.Tensor.chunk). Default is 168.

forward(query, key, value, mask=None)

Propagate forward the input through the MHB.

We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).

Parameters
  • query (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute queries.

  • key (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute keys.

  • value (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute values.

  • mask (Optional[str]) – Mask to apply on scores before computing attention. One of 'subsequent', None. Default is None.

Return type

Tensor

Returns

Self attention tensor with shape (batch_size, K, d_model).

class multiHeadAttention.MultiHeadAttentionWindow(d_model, q, v, h, attention_size=None, window_size=168, padding=42, **kwargs)

Bases: MultiHeadAttention

Multi Head Attention block with moving window.

Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model). Queries, keys and values are divided in chunks using a moving window.

Parameters
  • d_model (int) – Dimension of the input vector.

  • q (int) – Dimension of all query matrix.

  • v (int) – Dimension of all value matrix.

  • h (int) – Number of heads.

  • attention_size (Optional[int]) – Number of backward elements to apply attention. Deactivated if None. Default is None.

  • window_size (Optional[int]) – Size of the window used to extract chunks. Default is 168

  • padding (Optional[int]) – Padding around each window. Padding will be applied to input sequence. Default is 168 // 4 = 42.

forward(query, key, value, mask=None)

Propagate forward the input through the MHB.

We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).

Parameters
  • query (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute queries.

  • key (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute keys.

  • value (Tensor) – Input tensor with shape (batch_size, K, d_model) used to compute values.

  • mask (Optional[str]) – Mask to apply on scores before computing attention. One of 'subsequent', None. Default is None.

Return type

Tensor

Returns

Self attention tensor with shape (batch_size, K, d_model).

PositionwiseFeedForward module

class positionwiseFeedForward.PositionwiseFeedForward(d_model, d_ff=2048)

Bases: Module

Position-wise Feed Forward Network block from Attention is All You Need.

Apply two linear transformations to each input, separately but indetically. We implement them as 1D convolutions. Input and output have a shape (batch_size, d_model).

Parameters
  • d_model (int) – Dimension of input tensor.

  • d_ff (Optional[int]) – Dimension of hidden layer, default is 2048.

forward(x)

Propagate forward the input through the PFF block.

Apply the first linear transformation, then a relu actvation, and the second linear transformation.

Parameters

x (Tensor) – Input tensor with shape (batch_size, K, d_model).

Return type

Tensor

Returns

Output tensor with shape (batch_size, K, d_model).

Loss module

class loss.OZELoss(reduction='mean', alpha=0.3)

Bases: Module

Custom loss for TRNSys metamodel.

Compute, for temperature and consumptions, the intergral of the squared differences over time. Sum the log with a coeficient alpha.

\[ \begin{align}\begin{aligned}\Delta_T = \sqrt{\int (y_{est}^T - y^T)^2}\\\Delta_Q = \sqrt{\int (y_{est}^Q - y^Q)^2}\\loss = log(1 + \Delta_T) + \alpha \cdot log(1 + \Delta_Q)\end{aligned}\end{align} \]
alpha:

Coefficient for consumption. Default is 0.3.

forward(y_true, y_pred)

Compute the loss between a target value and a prediction.

Parameters
  • y_true (Tensor) – Target value.

  • y_pred (Tensor) – Estimated value.

Return type

Tensor

Returns

Loss as a tensor with gradient attached.

Utils module

utils.generate_local_map_mask(chunk_size, attention_size, mask_future=False, device='cpu')

Compute attention mask as attention_size wide diagonal.

Parameters
  • chunk_size (int) – Time dimension size.

  • attention_size (int) – Number of backward elements to apply attention.

  • device (device) – torch device. Default is 'cpu'.

Return type

BoolTensor

Returns

Mask as a boolean tensor.

utils.generate_original_PE(length, d_model)

Generate positional encoding as described in original paper. torch.Tensor

Parameters
  • length (int) – Time window length, i.e. K.

  • d_model (int) – Dimension of the model vector.

Return type

Tensor

Returns

Tensor of shape (K, d_model).

utils.generate_regular_PE(length, d_model, period=24)

Generate positional encoding with a given period.

Parameters
  • length (int) – Time window length, i.e. K.

  • d_model (int) – Dimension of the model vector.

  • period (Optional[int]) – Size of the pattern to repeat. Default is 24.

Return type

Tensor

Returns

Tensor of shape (K, d_model).

Visualizations

Training visualization - 2021 March 28

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
DATASET_PATH = 'datasets/dataset_sample_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Load dataset

[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataloader = DataLoader(ozeDataset,
                          batch_size=BATCH_SIZE,
                          shuffle=False,
                          num_workers=NUM_WORKERS,
                          pin_memory=False
                         )

Load network

[4]:
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
net.load_state_dict(torch.load('models/model_2020_03_10__231146.pth'))
_ = net.eval()

Evaluate on the test dataset

[5]:
predictions = np.empty(shape=(len(dataloader.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader, total=len(dataloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 18.72it/s]
Plot results on a sample
[6]:
map_plot_function(ozeDataset, predictions, plot_visual_sample)
_images/notebooks_visualizations_visu_2020_03_28__120412_10_0.png
Plot encoding attention map
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_visualizations_visu_2020_03_28__120412_12_0.png
Plot dataset and prediction distributions for consumptions
[8]:
map_plot_function(ozeDataset, predictions, plot_values_distribution, time_limit=24, labels=['Q_AC_OFFICE',
                                                                                            'Q_HEAT_OFFICE',
                                                                                            'Q_PEOPLE',
                                                                                            'Q_EQP',
                                                                                            'Q_LIGHT',
                                                                                            'Q_AHU_C',
                                                                                            'Q_AHU_H'])
_images/notebooks_visualizations_visu_2020_03_28__120412_14_0.png
Plot error distribution for temperature
[9]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, labels=['T_INT_OFFICE'], time_limit=24)
_images/notebooks_visualizations_visu_2020_03_28__120412_16_0.png
Plot mispredictions thresholds
[10]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1})
_images/notebooks_visualizations_visu_2020_03_28__120412_18_0.png

Demonstrateur

[1]:
import json

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML

from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation

%matplotlib notebook

# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()

Température intérieure médiane

[2]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation().to_jshtml())
[2]:

Consommation totale

[3]:
animation = HistAnimation.load('animations/elec.anim')
HTML(animation.run_animation().to_jshtml())
[3]:

Erreur cumulée

cumulativeerror

Gestion de l’intermittence

dispersiontemperature

Optimisation de l’utilisation de Climespace

deltaT

consoclim

Optimisation de la programmation des Centrales de Traitement d’Air

consocta

Feuille de Route, programmation saisonnière

horaires

Consommations d’énergie thermique

consocpcu

conso30

closing

Demonstrateur

[1]:
import json

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML

from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation

%matplotlib notebook

# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()

Données de contexte : Emetteurs

horaires

Consommations d’énergie thermique

consocpcu

conso30

Optimisation de l’utilisation de Climespace

deltaT

consoclim

Optimisation de la programmation des Centrales de Traitement d’Air

consocta

Température intérieure

[2]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation().to_jshtml())
[2]:

Consommation Privées

[3]:
animation = HistAnimation.load('animations/elec.anim')
HTML(animation.run_animation().to_jshtml())
[3]:

closing

Demonstrateur

[1]:
import json

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML

from oze.utils.env_setup import load_config
from oze.inputs import load_building
from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation
[2]:
%matplotlib notebook
# matplotlib.rcParams['animation.embed_limit'] = 100

# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()

Indoor temperature

[3]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation(max_frames=10).to_jshtml())
[3]:

Private consumptions

[4]:
animation = HistAnimation.load('animations/private.anim')
HTML(animation.run_animation(max_frames=10).to_jshtml())
[4]:

Trainings

Classic - 2020 June 27

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_57M.npz'
BATCH_SIZE = 8
NUM_WORKERS = 0
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 8 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 12 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 27 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
[4]:
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (23000, 1000, 1000))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.44it/s, loss=0.0043, val_loss=0.00177]
[Epoch   2/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.48it/s, loss=0.00127, val_loss=0.0013]
[Epoch   3/30]: 100%|██████████| 23000/23000 [05:02<00:00, 76.07it/s, loss=0.000871, val_loss=0.000957]
[Epoch   4/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.47it/s, loss=0.000632, val_loss=0.000511]
[Epoch   5/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.65it/s, loss=0.000491, val_loss=0.000418]
[Epoch   6/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000394, val_loss=0.000349]
[Epoch   7/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.27it/s, loss=0.000325, val_loss=0.000378]
[Epoch   8/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.82it/s, loss=0.000285, val_loss=0.000268]
[Epoch   9/30]: 100%|██████████| 23000/23000 [05:02<00:00, 75.96it/s, loss=0.000254, val_loss=0.000223]
[Epoch  10/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.38it/s, loss=0.000222, val_loss=0.00022]
[Epoch  11/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.86it/s, loss=0.000206, val_loss=0.000187]
[Epoch  12/30]: 100%|██████████| 23000/23000 [05:02<00:00, 75.97it/s, loss=0.000191, val_loss=0.000182]
[Epoch  13/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.40it/s, loss=0.000177, val_loss=0.000174]
[Epoch  14/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000169, val_loss=0.000169]
[Epoch  15/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.42it/s, loss=0.00016, val_loss=0.00015]
[Epoch  16/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.40it/s, loss=0.000149, val_loss=0.00014]
[Epoch  17/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.46it/s, loss=0.000145, val_loss=0.000163]
[Epoch  18/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.53it/s, loss=0.000138, val_loss=0.000142]
[Epoch  19/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.54it/s, loss=0.000132, val_loss=0.000162]
[Epoch  20/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.46it/s, loss=0.000127, val_loss=0.000135]
[Epoch  21/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.59it/s, loss=0.000121, val_loss=0.000136]
[Epoch  22/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.79it/s, loss=0.000119, val_loss=0.000127]
[Epoch  23/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.73it/s, loss=0.000112, val_loss=0.000122]
[Epoch  24/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.37it/s, loss=0.000109, val_loss=0.000107]
[Epoch  25/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.67it/s, loss=0.000107, val_loss=0.000147]
[Epoch  26/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.68it/s, loss=0.000103, val_loss=0.000114]
[Epoch  27/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000101, val_loss=0.000108]
[Epoch  28/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.23it/s, loss=9.82e-5, val_loss=0.000108]
[Epoch  29/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.32it/s, loss=9.44e-5, val_loss=0.000102]
[Epoch  30/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.50it/s, loss=9.13e-5, val_loss=0.000107]
model exported to models/model_2020_06_27__062220.pth with loss 0.000102
_images/notebooks_trainings_training_2020_06_27__164648_10_2.png

Validation

[7]:
_ = net.eval()
Evaluate on the test dataset
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:04<00:00, 26.91it/s]
Plot results on a sample
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_06_27__164648_16_0.png
Plot error distributions
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)
_images/notebooks_trainings_training_2020_06_27__164648_18_0.png
Plot mispredictions thresholds
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_06_27__164648_20_0.png

Classic - 2020 April 27

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 0
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 8 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 12 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
[4]:
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.76it/s, loss=0.00524, val_loss=0.00232]
[Epoch   2/30]: 100%|██████████| 38000/38000 [06:18<00:00, 100.50it/s, loss=0.00175, val_loss=0.00144]
[Epoch   3/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.81it/s, loss=0.00115, val_loss=0.00104]
[Epoch   4/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.03it/s, loss=0.000849, val_loss=0.000727]
[Epoch   5/30]: 100%|██████████| 38000/38000 [06:19<00:00, 100.20it/s, loss=0.000676, val_loss=0.000562]
[Epoch   6/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.45it/s, loss=0.000576, val_loss=0.000496]
[Epoch   7/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.78it/s, loss=0.000493, val_loss=0.000451]
[Epoch   8/30]: 100%|██████████| 38000/38000 [06:17<00:00, 100.78it/s, loss=0.000441, val_loss=0.000447]
[Epoch   9/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.74it/s, loss=0.000402, val_loss=0.00042]
[Epoch  10/30]: 100%|██████████| 38000/38000 [06:06<00:00, 103.58it/s, loss=0.000374, val_loss=0.000379]
[Epoch  11/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.46it/s, loss=0.000348, val_loss=0.000334]
[Epoch  12/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.12it/s, loss=0.000326, val_loss=0.000374]
[Epoch  13/30]: 100%|██████████| 38000/38000 [06:07<00:00, 103.35it/s, loss=0.000316, val_loss=0.000357]
[Epoch  14/30]: 100%|██████████| 38000/38000 [06:11<00:00, 102.17it/s, loss=0.000289, val_loss=0.000278]
[Epoch  15/30]: 100%|██████████| 38000/38000 [06:12<00:00, 102.08it/s, loss=0.000283, val_loss=0.000285]
[Epoch  16/30]: 100%|██████████| 38000/38000 [06:05<00:00, 103.89it/s, loss=0.000264, val_loss=0.000276]
[Epoch  17/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.58it/s, loss=0.000254, val_loss=0.000353]
[Epoch  18/30]: 100%|██████████| 38000/38000 [06:12<00:00, 101.92it/s, loss=0.000248, val_loss=0.000291]
[Epoch  19/30]: 100%|██████████| 38000/38000 [06:05<00:00, 104.04it/s, loss=0.000236, val_loss=0.00027]
[Epoch  20/30]: 100%|██████████| 38000/38000 [06:07<00:00, 103.36it/s, loss=0.000228, val_loss=0.00029]
[Epoch  21/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.73it/s, loss=0.000219, val_loss=0.000224]
[Epoch  22/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.51it/s, loss=0.000222, val_loss=0.00023]
[Epoch  23/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.71it/s, loss=0.000214, val_loss=0.000239]
[Epoch  24/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.13it/s, loss=0.000206, val_loss=0.000208]
[Epoch  25/30]: 100%|██████████| 38000/38000 [06:15<00:00, 101.30it/s, loss=0.000202, val_loss=0.00021]
[Epoch  26/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.61it/s, loss=0.000194, val_loss=0.000199]
[Epoch  27/30]: 100%|██████████| 38000/38000 [06:05<00:00, 104.08it/s, loss=0.000192, val_loss=0.000218]
[Epoch  28/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.51it/s, loss=0.000188, val_loss=0.000238]
[Epoch  29/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.79it/s, loss=0.000181, val_loss=0.000182]
[Epoch  30/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.80it/s, loss=0.000176, val_loss=0.000192]
model exported to models/model_2020_04_26__162559.pth with loss 0.000182
_images/notebooks_trainings_training_2020_04_27__093505_10_2.png

Validation

[7]:
_ = net.eval()
Evaluate on the test dataset
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 21.38it/s]
Plot results on a sample
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_04_27__093505_16_0.png
Plot error distributions
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)
_images/notebooks_trainings_training_2020_04_27__093505_18_0.png
Plot mispredictions thresholds
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_04_27__093505_20_0.png

Benchmark ConvGru - 2020 April 14

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import BiGRU, ConvGru
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
N = 2 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = ConvGru(d_input, d_model, d_output, N, dropout=dropout, bidirectional=True).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.00635, val_loss=0.00301]
[Epoch   2/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.56it/s, loss=0.00241, val_loss=0.0019]
[Epoch   3/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.54it/s, loss=0.00177, val_loss=0.0015]
[Epoch   4/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.51it/s, loss=0.00147, val_loss=0.00152]
[Epoch   5/30]: 100%|██████████| 38000/38000 [07:39<00:00, 82.63it/s, loss=0.00126, val_loss=0.00126]
[Epoch   6/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.56it/s, loss=0.00111, val_loss=0.00103]
[Epoch   7/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.000981, val_loss=0.00103]
[Epoch   8/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.57it/s, loss=0.000876, val_loss=0.000755]
[Epoch   9/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.49it/s, loss=0.000778, val_loss=0.000698]
[Epoch  10/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.58it/s, loss=0.000688, val_loss=0.000631]
[Epoch  11/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.55it/s, loss=0.00062, val_loss=0.000549]
[Epoch  12/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.43it/s, loss=0.000561, val_loss=0.000497]
[Epoch  13/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.34it/s, loss=0.000514, val_loss=0.000461]
[Epoch  14/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.50it/s, loss=0.000478, val_loss=0.000513]
[Epoch  15/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.49it/s, loss=0.000447, val_loss=0.000399]
[Epoch  16/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.48it/s, loss=0.000424, val_loss=0.000407]
[Epoch  17/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.30it/s, loss=0.000401, val_loss=0.000382]
[Epoch  18/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.000381, val_loss=0.000346]
[Epoch  19/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.38it/s, loss=0.000365, val_loss=0.00035]
[Epoch  20/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.47it/s, loss=0.000351, val_loss=0.000329]
[Epoch  21/30]: 100%|██████████| 38000/38000 [06:04<00:00, 104.30it/s, loss=0.000335, val_loss=0.000313]
[Epoch  22/30]: 100%|██████████| 38000/38000 [03:08<00:00, 201.75it/s, loss=0.000323, val_loss=0.000329]
[Epoch  23/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.14it/s, loss=0.000313, val_loss=0.000291]
[Epoch  24/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.21it/s, loss=0.0003, val_loss=0.000302]
[Epoch  25/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.71it/s, loss=0.000294, val_loss=0.000298]
[Epoch  26/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.67it/s, loss=0.000284, val_loss=0.000279]
[Epoch  27/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.40it/s, loss=0.000276, val_loss=0.000265]
[Epoch  28/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.67it/s, loss=0.000272, val_loss=0.000265]
[Epoch  29/30]: 100%|██████████| 38000/38000 [03:07<00:00, 203.04it/s, loss=0.000265, val_loss=0.000248]
[Epoch  30/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.93it/s, loss=0.000258, val_loss=0.000281]
model exported to models/model_LSTM_2020_04_14__101819.pth with loss 0.000248

_images/notebooks_trainings_training_2020_04_14__143020_10_3.png

Validation

[7]:
_ = net.eval()
Evaluate on the test dataset
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:01<00:00, 82.73it/s]
Plot results on a sample
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_04_14__143020_16_0.png
Plot error distributions
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)
_images/notebooks_trainings_training_2020_04_14__143020_18_0.png
Plot mispredictions thresholds
[ ]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)

Benchmark BiGRU - 2020 April 01

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import BiGRU
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
N = 4 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = BiGRU(d_input, d_model, d_output, N, dropout=dropout, bidirectional=True).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.00789, val_loss=0.00377]
[Epoch   2/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.90it/s, loss=0.00307, val_loss=0.0025]
[Epoch   3/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.00227, val_loss=0.00198]
[Epoch   4/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.78it/s, loss=0.00183, val_loss=0.00167]
[Epoch   5/30]: 100%|██████████| 38000/38000 [17:03<00:00, 37.12it/s, loss=0.00152, val_loss=0.00132]
[Epoch   6/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.19it/s, loss=0.00126, val_loss=0.00114]
[Epoch   7/30]: 100%|██████████| 38000/38000 [17:06<00:00, 37.00it/s, loss=0.00108, val_loss=0.000976]
[Epoch   8/30]: 100%|██████████| 38000/38000 [17:18<00:00, 36.58it/s, loss=0.000932, val_loss=0.00087]
[Epoch   9/30]: 100%|██████████| 38000/38000 [17:16<00:00, 36.65it/s, loss=0.000825, val_loss=0.000795]
[Epoch  10/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.000739, val_loss=0.000694]
[Epoch  11/30]: 100%|██████████| 38000/38000 [17:12<00:00, 36.80it/s, loss=0.00067, val_loss=0.000609]
[Epoch  12/30]: 100%|██████████| 38000/38000 [17:24<00:00, 36.39it/s, loss=0.000616, val_loss=0.000569]
[Epoch  13/30]: 100%|██████████| 38000/38000 [17:16<00:00, 36.67it/s, loss=0.000572, val_loss=0.000543]
[Epoch  14/30]: 100%|██████████| 38000/38000 [17:10<00:00, 36.89it/s, loss=0.000534, val_loss=0.000515]
[Epoch  15/30]: 100%|██████████| 38000/38000 [17:12<00:00, 36.81it/s, loss=0.000503, val_loss=0.00049]
[Epoch  16/30]: 100%|██████████| 38000/38000 [17:15<00:00, 36.71it/s, loss=0.000474, val_loss=0.000442]
[Epoch  17/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.77it/s, loss=0.000451, val_loss=0.000419]
[Epoch  18/30]: 100%|██████████| 38000/38000 [17:06<00:00, 37.03it/s, loss=0.000428, val_loss=0.00041]
[Epoch  19/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.93it/s, loss=0.000408, val_loss=0.0004]
[Epoch  20/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.90it/s, loss=0.00039, val_loss=0.00042]
[Epoch  21/30]: 100%|██████████| 38000/38000 [17:08<00:00, 36.93it/s, loss=0.000375, val_loss=0.000351]
[Epoch  22/30]: 100%|██████████| 38000/38000 [17:10<00:00, 36.86it/s, loss=0.000361, val_loss=0.000343]
[Epoch  23/30]: 100%|██████████| 38000/38000 [17:15<00:00, 36.71it/s, loss=0.000348, val_loss=0.000341]
[Epoch  24/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.000337, val_loss=0.000338]
[Epoch  25/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.19it/s, loss=0.000329, val_loss=0.000318]
[Epoch  26/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.000317, val_loss=0.000333]
[Epoch  27/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.78it/s, loss=0.000308, val_loss=0.00029]
[Epoch  28/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.22it/s, loss=0.000303, val_loss=0.00028]
[Epoch  29/30]: 100%|██████████| 38000/38000 [17:07<00:00, 37.00it/s, loss=0.000291, val_loss=0.000292]
[Epoch  30/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.000283, val_loss=0.000268]
model exported to models/model_LSTM_2020_04_01__102333.pth with loss 0.000268
_images/notebooks_trainings_training_2020_04_01__193853_10_2.png

Validation

[7]:
_ = net.eval()
Evaluate on the test dataset
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 20.85it/s]
Plot results on a sample
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_04_01__193853_16_0.png
Plot error distributions
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)
_images/notebooks_trainings_training_2020_04_01__193853_18_0.png
Plot mispredictions thresholds
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_04_01__193853_20_0.png

Benchmark LSTM - 2020 March 31

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
N = 4 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.30it/s, loss=0.0153, val_loss=0.00872]
[Epoch   2/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.49it/s, loss=0.00701, val_loss=0.00584]
[Epoch   3/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00527, val_loss=0.00495]
[Epoch   4/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00461, val_loss=0.00438]
[Epoch   5/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00417, val_loss=0.00407]
[Epoch   6/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00387, val_loss=0.00379]
[Epoch   7/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00363, val_loss=0.00355]
[Epoch   8/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00343, val_loss=0.00344]
[Epoch   9/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.26it/s, loss=0.00326, val_loss=0.00322]
[Epoch  10/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00313, val_loss=0.00312]
[Epoch  11/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.47it/s, loss=0.00302, val_loss=0.00299]
[Epoch  12/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.31it/s, loss=0.00292, val_loss=0.00289]
[Epoch  13/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00283, val_loss=0.00282]
[Epoch  14/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.52it/s, loss=0.00275, val_loss=0.00273]
[Epoch  15/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.27it/s, loss=0.00267, val_loss=0.00268]
[Epoch  16/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.42it/s, loss=0.00259, val_loss=0.00259]
[Epoch  17/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00252, val_loss=0.0025]
[Epoch  18/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00245, val_loss=0.0025]
[Epoch  19/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.39it/s, loss=0.00239, val_loss=0.00239]
[Epoch  20/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.55it/s, loss=0.00233, val_loss=0.00232]
[Epoch  21/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00226, val_loss=0.00232]
[Epoch  22/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.46it/s, loss=0.00222, val_loss=0.00225]
[Epoch  23/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.51it/s, loss=0.00218, val_loss=0.00218]
[Epoch  24/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00215, val_loss=0.00216]
[Epoch  25/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00213, val_loss=0.00212]
[Epoch  26/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.46it/s, loss=0.0021, val_loss=0.00212]
[Epoch  27/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.30it/s, loss=0.00207, val_loss=0.00209]
[Epoch  28/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.40it/s, loss=0.00205, val_loss=0.00208]
[Epoch  29/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.49it/s, loss=0.00203, val_loss=0.00206]
[Epoch  30/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00201, val_loss=0.00201]
model exported to models/model_LSTM_2020_03_31__112637.pth with loss 0.002010
_images/notebooks_trainings_training_2020_03_31__163536_10_2.png

Validation

[7]:
_ = net.eval()
Evaluate on the test dataset
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:03<00:00, 35.96it/s]
Plot results on a sample
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_03_31__163536_16_0.png
Plot error distributions
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)
_images/notebooks_trainings_training_2020_03_31__163536_18_0.png
Plot mispredictions thresholds
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)
_images/notebooks_trainings_training_2020_03_31__163536_20_0.png

Classic - 2020 March 12

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cpu

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [26:53<00:00, 23.55it/s, loss=0.00554, val_loss=0.0033]
[Epoch   2/30]: 100%|██████████| 38000/38000 [26:33<00:00, 23.85it/s, loss=0.00247, val_loss=0.00185]
[Epoch   3/30]: 100%|██████████| 38000/38000 [26:59<00:00, 23.46it/s, loss=0.00169, val_loss=0.00148]
[Epoch   4/30]: 100%|██████████| 38000/38000 [26:54<00:00, 23.54it/s, loss=0.00129, val_loss=0.00117]
[Epoch   5/30]: 100%|██████████| 38000/38000 [26:57<00:00, 23.49it/s, loss=0.00108, val_loss=0.001]
[Epoch   6/30]: 100%|██████████| 38000/38000 [26:59<00:00, 23.47it/s, loss=0.000946, val_loss=0.000952]
[Epoch   7/30]: 100%|██████████| 38000/38000 [26:57<00:00, 23.49it/s, loss=0.000834, val_loss=0.000791]
[Epoch   8/30]: 100%|██████████| 38000/38000 [26:49<00:00, 23.61it/s, loss=0.000753, val_loss=0.000714]
[Epoch   9/30]: 100%|██████████| 38000/38000 [27:00<00:00, 23.45it/s, loss=0.000683, val_loss=0.00065]
[Epoch  10/30]: 100%|██████████| 38000/38000 [26:54<00:00, 23.54it/s, loss=0.000637, val_loss=0.000634]
[Epoch  11/30]: 100%|██████████| 38000/38000 [26:58<00:00, 23.48it/s, loss=0.000591, val_loss=0.000569]
[Epoch  12/30]: 100%|██████████| 38000/38000 [27:00<00:00, 23.45it/s, loss=0.000549, val_loss=0.000596]
[Epoch  13/30]: 100%|██████████| 38000/38000 [27:09<00:00, 23.32it/s, loss=0.000524, val_loss=0.000506]
[Epoch  14/30]: 100%|██████████| 38000/38000 [26:53<00:00, 23.55it/s, loss=0.000496, val_loss=0.00048]
[Epoch  15/30]: 100%|██████████| 38000/38000 [27:06<00:00, 23.37it/s, loss=0.00047, val_loss=0.000466]
[Epoch  16/30]: 100%|██████████| 38000/38000 [27:09<00:00, 23.32it/s, loss=0.000448, val_loss=0.000412]
[Epoch  17/30]: 100%|██████████| 38000/38000 [27:13<00:00, 23.26it/s, loss=0.000436, val_loss=0.000442]
[Epoch  18/30]: 100%|██████████| 38000/38000 [27:04<00:00, 23.40it/s, loss=0.000412, val_loss=0.000424]
[Epoch  19/30]: 100%|██████████| 38000/38000 [27:10<00:00, 23.31it/s, loss=0.000397, val_loss=0.000468]
[Epoch  20/30]: 100%|██████████| 38000/38000 [27:15<00:00, 23.24it/s, loss=0.000381, val_loss=0.000396]
[Epoch  21/30]: 100%|██████████| 38000/38000 [27:16<00:00, 23.22it/s, loss=0.000372, val_loss=0.000375]
[Epoch  22/30]: 100%|██████████| 38000/38000 [27:16<00:00, 23.23it/s, loss=0.000361, val_loss=0.000355]
[Epoch  23/30]: 100%|██████████| 38000/38000 [27:08<00:00, 23.34it/s, loss=0.000346, val_loss=0.000331]
[Epoch  24/30]: 100%|██████████| 38000/38000 [27:12<00:00, 23.27it/s, loss=0.000334, val_loss=0.000352]
[Epoch  25/30]: 100%|██████████| 38000/38000 [27:14<00:00, 23.24it/s, loss=0.000324, val_loss=0.000401]
[Epoch  26/30]: 100%|██████████| 38000/38000 [27:18<00:00, 23.19it/s, loss=0.000324, val_loss=0.000319]
[Epoch  27/30]: 100%|██████████| 38000/38000 [27:19<00:00, 23.18it/s, loss=0.000305, val_loss=0.000319]
[Epoch  28/30]: 100%|██████████| 38000/38000 [27:12<00:00, 23.28it/s, loss=0.000303, val_loss=0.000318]
[Epoch  29/30]: 100%|██████████| 38000/38000 [27:19<00:00, 23.18it/s, loss=0.000295, val_loss=0.000297]
[Epoch  30/30]: 100%|██████████| 38000/38000 [27:15<00:00, 23.23it/s, loss=0.000287, val_loss=0.000286]
model exported to models/model_2020_03_10__231146.pth with loss 0.000286
_images/notebooks_trainings_training_2020_03_12__195104_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_03_12__195104_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_trainings_training_2020_03_12__195104_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:17<00:00,  7.00it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    if label.startswith('Q_'):
        # Convert kJ/h to kW
        y_true /= 3600
        y_pred /= 3600

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'


    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_03_12__195104_18_0.png

Benchmark - 2020 March 05

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[9]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 3e-5
EPOCHS = 30

# Model parameters
d_model = 128 # Lattent dim
N = 8*2 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))
[4]:
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[10]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[11]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.94it/s, loss=0.0318, val_loss=0.0238]
[Epoch   2/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.30it/s, loss=0.0234, val_loss=0.0234]
[Epoch   3/30]: 100%|██████████| 38000/38000 [07:01<00:00, 90.12it/s, loss=0.0189, val_loss=0.0142]
[Epoch   4/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.69it/s, loss=0.0128, val_loss=0.0122]
[Epoch   5/30]: 100%|██████████| 38000/38000 [06:59<00:00, 90.58it/s, loss=0.012, val_loss=0.0119]
[Epoch   6/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.38it/s, loss=0.0118, val_loss=0.0117]
[Epoch   7/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.27it/s, loss=0.0116, val_loss=0.0117]
[Epoch   8/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.89it/s, loss=0.0115, val_loss=0.0115]
[Epoch   9/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.74it/s, loss=0.0114, val_loss=0.0115]
[Epoch  10/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.32it/s, loss=0.0114, val_loss=0.0114]
[Epoch  11/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.45it/s, loss=0.0112, val_loss=0.0112]
[Epoch  12/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.93it/s, loss=0.0111, val_loss=0.011]
[Epoch  13/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.79it/s, loss=0.0109, val_loss=0.0109]
[Epoch  14/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.44it/s, loss=0.0108, val_loss=0.0108]
[Epoch  15/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.44it/s, loss=0.0107, val_loss=0.0107]
[Epoch  16/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.93it/s, loss=0.0107, val_loss=0.0107]
[Epoch  17/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.80it/s, loss=0.0106, val_loss=0.0106]
[Epoch  18/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.40it/s, loss=0.0106, val_loss=0.0107]
[Epoch  19/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.29it/s, loss=0.0105, val_loss=0.0105]
[Epoch  20/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0104, val_loss=0.0105]
[Epoch  21/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0104, val_loss=0.0105]
[Epoch  22/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.45it/s, loss=0.0103, val_loss=0.0104]
[Epoch  23/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.32it/s, loss=0.0103, val_loss=0.0103]
[Epoch  24/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.86it/s, loss=0.0103, val_loss=0.0104]
[Epoch  25/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.72it/s, loss=0.0102, val_loss=0.0103]
[Epoch  26/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.46it/s, loss=0.0102, val_loss=0.0103]
[Epoch  27/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.41it/s, loss=0.0101, val_loss=0.0103]
[Epoch  28/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0101, val_loss=0.0102]
[Epoch  29/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.90it/s, loss=0.0101, val_loss=0.0101]
[Epoch  30/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.42it/s, loss=0.01, val_loss=0.0101]
model exported to models/model_LSTM_2020_03_04__211137.pth with loss 0.010125

_images/notebooks_trainings_training_2020_03_05__080607_10_3.png

Validation

[ ]:
_ = net.eval()
Plot results on a sample
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_03_05__080607_14_0.png
Evaluate on the test dataset
[ ]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
[ ]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    if label.startswith('Q_'):
        # Convert kJ/h to kW
        y_true /= 3600
        y_pred /= 3600

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'


    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

    ax.legend()

plt.savefig('error_mean_std')

Benchmark - 2020 March 04

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 64 # Lattent dim
N = 4*2 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))
[4]:
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.83it/s, loss=0.0172, val_loss=0.0117]
[Epoch   2/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.56it/s, loss=0.0113, val_loss=0.0109]
[Epoch   3/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.82it/s, loss=0.0107, val_loss=0.0104]
[Epoch   4/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.88it/s, loss=0.00956, val_loss=0.00917]
[Epoch   5/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.62it/s, loss=0.00899, val_loss=0.0089]
[Epoch   6/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.51it/s, loss=0.00865, val_loss=0.00852]
[Epoch   7/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.10it/s, loss=0.00832, val_loss=0.00827]
[Epoch   8/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.42it/s, loss=0.00814, val_loss=0.00813]
[Epoch   9/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.85it/s, loss=0.00803, val_loss=0.00802]
[Epoch  10/30]: 100%|██████████| 38000/38000 [02:39<00:00, 238.05it/s, loss=0.00794, val_loss=0.00799]
[Epoch  11/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.58it/s, loss=0.00786, val_loss=0.00788]
[Epoch  12/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.32it/s, loss=0.00777, val_loss=0.00773]
[Epoch  13/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.70it/s, loss=0.00767, val_loss=0.00756]
[Epoch  14/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.53it/s, loss=0.00725, val_loss=0.00716]
[Epoch  15/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.63it/s, loss=0.00702, val_loss=0.00692]
[Epoch  16/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.44it/s, loss=0.00691, val_loss=0.00685]
[Epoch  17/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.61it/s, loss=0.00683, val_loss=0.00676]
[Epoch  18/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.63it/s, loss=0.00676, val_loss=0.00676]
[Epoch  19/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.59it/s, loss=0.00667, val_loss=0.0066]
[Epoch  20/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.19it/s, loss=0.00648, val_loss=0.00626]
[Epoch  21/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.11it/s, loss=0.00622, val_loss=0.00612]
[Epoch  22/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.66it/s, loss=0.00611, val_loss=0.00605]
[Epoch  23/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.66it/s, loss=0.00604, val_loss=0.00596]
[Epoch  24/30]: 100%|██████████| 38000/38000 [02:41<00:00, 235.89it/s, loss=0.00598, val_loss=0.00597]
[Epoch  25/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.88it/s, loss=0.00593, val_loss=0.00589]
[Epoch  26/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.91it/s, loss=0.0059, val_loss=0.00578]
[Epoch  27/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.10it/s, loss=0.00586, val_loss=0.00576]
[Epoch  28/30]: 100%|██████████| 38000/38000 [02:41<00:00, 235.81it/s, loss=0.00582, val_loss=0.00574]
[Epoch  29/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.23it/s, loss=0.00579, val_loss=0.0058]
[Epoch  30/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.95it/s, loss=0.00576, val_loss=0.00573]
model exported to models/model_LSTM_2020_03_04__190333.pth with loss 0.005726

_images/notebooks_trainings_training_2020_03_04__202641_10_3.png

Validation

[7]:
_ = net.eval()
Plot results on a sample
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_03_04__202641_14_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 63/63 [00:00<00:00, 127.94it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    if label.startswith('Q_'):
        # Convert kJ/h to kW
        y_true /= 3600
        y_pred /= 3600

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'


    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_03_04__202641_17_0.png

Classic - 2020 February 25

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 50

# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 38 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.49it/s, loss=0.00563, val_loss=0.00277]
[Epoch   2/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.40it/s, loss=0.00223, val_loss=0.00155]
[Epoch   3/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.50it/s, loss=0.00149, val_loss=0.00123]
[Epoch   4/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.00113, val_loss=0.000995]
[Epoch   5/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.50it/s, loss=0.000901, val_loss=0.00084]
[Epoch   6/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000759, val_loss=0.000615]
[Epoch   7/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.00065, val_loss=0.000555]
[Epoch   8/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000573, val_loss=0.000527]
[Epoch   9/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.53it/s, loss=0.000514, val_loss=0.000619]
[Epoch  10/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000473, val_loss=0.000503]
[Epoch  11/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.53it/s, loss=0.000445, val_loss=0.000407]
[Epoch  12/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000402, val_loss=0.000384]
[Epoch  13/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000388, val_loss=0.000408]
[Epoch  14/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000371, val_loss=0.000333]
[Epoch  15/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000344, val_loss=0.000333]
[Epoch  16/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000331, val_loss=0.000407]
[Epoch  17/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000309, val_loss=0.000326]
[Epoch  18/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000304, val_loss=0.000302]
[Epoch  19/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.00029, val_loss=0.000312]
[Epoch  20/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000287, val_loss=0.000266]
[Epoch  21/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000269, val_loss=0.00029]
[Epoch  22/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000265, val_loss=0.000237]
[Epoch  23/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000255, val_loss=0.000237]
[Epoch  24/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000255, val_loss=0.00024]
[Epoch  25/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000244, val_loss=0.000225]
[Epoch  26/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000239, val_loss=0.000231]
[Epoch  27/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000229, val_loss=0.000241]
[Epoch  28/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000226, val_loss=0.000245]
[Epoch  29/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000221, val_loss=0.000221]
[Epoch  30/50]: 100%|██████████| 38000/38000 [14:34<00:00, 43.43it/s, loss=0.000226, val_loss=0.000208]
[Epoch  31/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000209, val_loss=0.000219]
[Epoch  32/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000223, val_loss=0.000222]
[Epoch  33/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000217, val_loss=0.000224]
[Epoch  34/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000202, val_loss=0.000199]
[Epoch  35/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000194, val_loss=0.000191]
[Epoch  36/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000198, val_loss=0.000185]
[Epoch  37/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.51it/s, loss=0.000189, val_loss=0.000211]
[Epoch  38/50]: 100%|██████████| 38000/38000 [14:36<00:00, 43.35it/s, loss=0.000195, val_loss=0.00018]
[Epoch  39/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.51it/s, loss=0.000183, val_loss=0.00029]
[Epoch  40/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000183, val_loss=0.000161]
[Epoch  41/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000181, val_loss=0.000168]
[Epoch  42/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000178, val_loss=0.000179]
[Epoch  43/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000174, val_loss=0.000174]
[Epoch  44/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000181, val_loss=0.000155]
[Epoch  45/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000168, val_loss=0.000191]
[Epoch  46/50]: 100%|██████████| 38000/38000 [14:34<00:00, 43.43it/s, loss=0.000165, val_loss=0.000185]
[Epoch  47/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.00017, val_loss=0.000159]
[Epoch  48/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.00017, val_loss=0.000159]
[Epoch  49/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000165, val_loss=0.000173]
[Epoch  50/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000161, val_loss=0.000166]
model exported to models/model_2020_02_25__102558.pth with loss 0.000155

_images/notebooks_trainings_training_2020_02_25__224128_9_3.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_02_25__224128_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_trainings_training_2020_02_25__224128_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 63/63 [00:05<00:00, 12.26it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    if label.startswith('Q_'):
        # Convert kJ/h to kW
        y_true /= 3600
        y_pred /= 3600

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'


    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_02_25__224128_18_0.png

Window - 2020 January 31

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 50

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/50]: 100%|██████████| 12000/12000 [07:40<00:00, 26.04it/s, loss=0.00906, val_loss=0.00509]
[Epoch   2/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.67it/s, loss=0.00405, val_loss=0.00363]
[Epoch   3/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.63it/s, loss=0.00286, val_loss=0.00255]
[Epoch   4/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.63it/s, loss=0.00224, val_loss=0.00206]
[Epoch   5/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.67it/s, loss=0.00182, val_loss=0.00161]
[Epoch   6/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.00157, val_loss=0.00143]
[Epoch   7/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.00138, val_loss=0.00129]
[Epoch   8/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.00122, val_loss=0.00114]
[Epoch   9/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.00108, val_loss=0.00108]
[Epoch  10/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000974, val_loss=0.000869]
[Epoch  11/50]: 100%|██████████| 12000/12000 [07:31<00:00, 26.56it/s, loss=0.000885, val_loss=0.00078]
[Epoch  12/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000818, val_loss=0.000762]
[Epoch  13/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000743, val_loss=0.000992]
[Epoch  14/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000692, val_loss=0.000598]
[Epoch  15/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000645, val_loss=0.000682]
[Epoch  16/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000611, val_loss=0.000609]
[Epoch  17/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.00057, val_loss=0.0005]
[Epoch  18/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000542, val_loss=0.000509]
[Epoch  19/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000501, val_loss=0.000477]
[Epoch  20/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000507, val_loss=0.000479]
[Epoch  21/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.64it/s, loss=0.000465, val_loss=0.000489]
[Epoch  22/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.65it/s, loss=0.000449, val_loss=0.000459]
[Epoch  23/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000427, val_loss=0.00046]
[Epoch  24/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000417, val_loss=0.000403]
[Epoch  25/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000402, val_loss=0.000474]
[Epoch  26/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000387, val_loss=0.00034]
[Epoch  27/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000385, val_loss=0.00041]
[Epoch  28/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000374, val_loss=0.000387]
[Epoch  29/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000351, val_loss=0.000342]
[Epoch  30/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000352, val_loss=0.000397]
[Epoch  31/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000337, val_loss=0.000324]
[Epoch  32/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000337, val_loss=0.00031]
[Epoch  33/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.73it/s, loss=0.000328, val_loss=0.000298]
[Epoch  34/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.66it/s, loss=0.000315, val_loss=0.000318]
[Epoch  35/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000307, val_loss=0.000306]
[Epoch  36/50]: 100%|██████████| 12000/12000 [07:31<00:00, 26.56it/s, loss=0.000307, val_loss=0.0003]
[Epoch  37/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000294, val_loss=0.00032]
[Epoch  38/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000295, val_loss=0.000368]
[Epoch  39/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000282, val_loss=0.000274]
[Epoch  40/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.00028, val_loss=0.000255]
[Epoch  41/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000275, val_loss=0.000262]
[Epoch  42/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000264, val_loss=0.000247]
[Epoch  43/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.67it/s, loss=0.00027, val_loss=0.000292]
[Epoch  44/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000261, val_loss=0.00025]
[Epoch  45/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000253, val_loss=0.000283]
[Epoch  46/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000259, val_loss=0.000245]
[Epoch  47/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.00025, val_loss=0.000245]
[Epoch  48/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000248, val_loss=0.00025]
[Epoch  49/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000243, val_loss=0.000258]
[Epoch  50/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000238, val_loss=0.000219]
model exported to models/model_2020_01_31__082906.pth with loss 0.000219
_images/notebooks_trainings_training_2020_01_31__144602_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_01_31__144602_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_trainings_training_2020_01_31__144602_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.93it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_01_31__144602_18_0.png

Window - 2020 January 10

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.48it/s, loss=0.00826, val_loss=0.00478]
[Epoch   2/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00403, val_loss=0.0032]
[Epoch   3/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.36it/s, loss=0.00273, val_loss=0.00225]
[Epoch   4/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00217, val_loss=0.00182]
[Epoch   5/30]: 100%|██████████| 12000/12000 [06:49<00:00, 29.30it/s, loss=0.0018, val_loss=0.00155]
[Epoch   6/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.44it/s, loss=0.00152, val_loss=0.00134]
[Epoch   7/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00132, val_loss=0.00114]
[Epoch   8/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00118, val_loss=0.00106]
[Epoch   9/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.40it/s, loss=0.00103, val_loss=0.000951]
[Epoch  10/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.57it/s, loss=0.000919, val_loss=0.00132]
[Epoch  11/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.000829, val_loss=0.000809]
[Epoch  12/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.50it/s, loss=0.000756, val_loss=0.000734]
[Epoch  13/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.57it/s, loss=0.000701, val_loss=0.000649]
[Epoch  14/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.40it/s, loss=0.000651, val_loss=0.000719]
[Epoch  15/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.000608, val_loss=0.000567]
[Epoch  16/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.53it/s, loss=0.000569, val_loss=0.000607]
[Epoch  17/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.48it/s, loss=0.000538, val_loss=0.000533]
[Epoch  18/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000519, val_loss=0.000519]
[Epoch  19/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.59it/s, loss=0.000497, val_loss=0.000472]
[Epoch  20/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000468, val_loss=0.000667]
[Epoch  21/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.56it/s, loss=0.000458, val_loss=0.000544]
[Epoch  22/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.58it/s, loss=0.000427, val_loss=0.00039]
[Epoch  23/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.56it/s, loss=0.00042, val_loss=0.000406]
[Epoch  24/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.48it/s, loss=0.000401, val_loss=0.000395]
[Epoch  25/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.54it/s, loss=0.000392, val_loss=0.000384]
[Epoch  26/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.61it/s, loss=0.000377, val_loss=0.000438]
[Epoch  27/30]: 100%|██████████| 12000/12000 [06:44<00:00, 29.64it/s, loss=0.00036, val_loss=0.000381]
[Epoch  28/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.62it/s, loss=0.000358, val_loss=0.000331]
[Epoch  29/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000352, val_loss=0.000318]
[Epoch  30/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.45it/s, loss=0.000335, val_loss=0.000324]
model exported to models/model_2020_01_10__082029.pth with loss 0.000318
_images/notebooks_trainings_training_2020_01_10__114522_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_01_10__114522_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_trainings_training_2020_01_10__114522_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.13it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_01_10__114522_18_0.png

Classic - 2020 January 07

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.loss import OZELoss

from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.00923, val_loss=0.00494]
[Epoch   2/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.00479, val_loss=0.00407]
[Epoch   3/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.97it/s, loss=0.00405, val_loss=0.00366]
[Epoch   4/30]: 100%|██████████| 12000/12000 [10:30<00:00, 19.04it/s, loss=0.00344, val_loss=0.00312]
[Epoch   5/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.20it/s, loss=0.003, val_loss=0.00267]
[Epoch   6/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.00259, val_loss=0.00262]
[Epoch   7/30]: 100%|██████████| 12000/12000 [10:24<00:00, 19.21it/s, loss=0.00198, val_loss=0.00168]
[Epoch   8/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.13it/s, loss=0.00156, val_loss=0.00149]
[Epoch   9/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.14it/s, loss=0.00136, val_loss=0.00124]
[Epoch  10/30]: 100%|██████████| 12000/12000 [10:29<00:00, 19.08it/s, loss=0.00123, val_loss=0.00117]
[Epoch  11/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.00115, val_loss=0.00104]
[Epoch  12/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.00109, val_loss=0.000955]
[Epoch  13/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.19it/s, loss=0.00105, val_loss=0.000998]
[Epoch  14/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.001, val_loss=0.001]
[Epoch  15/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.11it/s, loss=0.000965, val_loss=0.000884]
[Epoch  16/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.13it/s, loss=0.000926, val_loss=0.000893]
[Epoch  17/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.09it/s, loss=0.000904, val_loss=0.000981]
[Epoch  18/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.000878, val_loss=0.00088]
[Epoch  19/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.17it/s, loss=0.000858, val_loss=0.000779]
[Epoch  20/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.17it/s, loss=0.000817, val_loss=0.000809]
[Epoch  21/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.000811, val_loss=0.000783]
[Epoch  22/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.14it/s, loss=0.00077, val_loss=0.000741]
[Epoch  23/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.10it/s, loss=0.000747, val_loss=0.000793]
[Epoch  24/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.12it/s, loss=0.000727, val_loss=0.000682]
[Epoch  25/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.000715, val_loss=0.000697]
[Epoch  26/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.12it/s, loss=0.00069, val_loss=0.000666]
[Epoch  27/30]: 100%|██████████| 12000/12000 [10:30<00:00, 19.02it/s, loss=0.000675, val_loss=0.000619]
[Epoch  28/30]: 100%|██████████| 12000/12000 [10:31<00:00, 19.01it/s, loss=0.000651, val_loss=0.000621]
[Epoch  29/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.96it/s, loss=0.00064, val_loss=0.000623]
[Epoch  30/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.98it/s, loss=0.000631, val_loss=0.000597]
model exported to models/model_2020_01_07__115048.pth with loss 0.000597
_images/notebooks_trainings_training_2020_01_07__172923_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")
_images/notebooks_trainings_training_2020_01_07__172923_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")
_images/notebooks_trainings_training_2020_01_07__172923_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:14<00:00,  8.47it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std')
_images/notebooks_trainings_training_2020_01_07__172923_18_0.png

Window - 2020 January 06

[1]:
import datetime

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst import Transformer
from tst.dataset import OzeDataset
from tst.loss import OZELoss
from tst.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 40

# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch   1/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.0101, val_loss=0.00518]
[Epoch   2/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.00421, val_loss=0.00311]
[Epoch   3/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.00274, val_loss=0.00212]
[Epoch   4/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.00207, val_loss=0.00181]
[Epoch   5/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.58it/s, loss=0.0017, val_loss=0.00147]
[Epoch   6/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.00144, val_loss=0.00128]
[Epoch   7/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.37it/s, loss=0.0013, val_loss=0.00128]
[Epoch   8/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.31it/s, loss=0.00118, val_loss=0.00113]
[Epoch   9/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.00106, val_loss=0.00106]
[Epoch  10/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000982, val_loss=0.000899]
[Epoch  11/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.000911, val_loss=0.00081]
[Epoch  12/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.31it/s, loss=0.000847, val_loss=0.000739]
[Epoch  13/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000778, val_loss=0.000816]
[Epoch  14/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.43it/s, loss=0.000739, val_loss=0.000652]
[Epoch  15/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.53it/s, loss=0.00069, val_loss=0.000621]
[Epoch  16/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000649, val_loss=0.000565]
[Epoch  17/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000614, val_loss=0.000607]
[Epoch  18/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000575, val_loss=0.000584]
[Epoch  19/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000549, val_loss=0.000569]
[Epoch  20/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.38it/s, loss=0.000524, val_loss=0.000572]
[Epoch  21/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000492, val_loss=0.000458]
[Epoch  22/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.50it/s, loss=0.000485, val_loss=0.000549]
[Epoch  23/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000455, val_loss=0.000647]
[Epoch  24/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.000441, val_loss=0.000572]
[Epoch  25/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.000422, val_loss=0.000376]
[Epoch  26/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000408, val_loss=0.000416]
[Epoch  27/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.53it/s, loss=0.000396, val_loss=0.000454]
[Epoch  28/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.00038, val_loss=0.000424]
[Epoch  29/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000371, val_loss=0.000427]
[Epoch  30/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000369, val_loss=0.000352]
[Epoch  31/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.46it/s, loss=0.000349, val_loss=0.00034]
[Epoch  32/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.000344, val_loss=0.000322]
[Epoch  33/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000342, val_loss=0.000327]
[Epoch  34/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000326, val_loss=0.00031]
[Epoch  35/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000326, val_loss=0.000317]
[Epoch  36/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000319, val_loss=0.000317]
[Epoch  37/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000303, val_loss=0.000341]
[Epoch  38/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.0003, val_loss=0.000297]
[Epoch  39/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000292, val_loss=0.000265]
[Epoch  40/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.47it/s, loss=0.000288, val_loss=0.000264]
model exported to models/model_2020_01_06__144203.pth with loss 0.000264
_images/notebooks_trainings_training_2020_01_06__190627_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2020_01_06__190627_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2020_01_06__190627_15_0.png
Evaluate on the test dataset
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.63it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2020_01_06__190627_18_0.png

Window - 2020 January 03

This training was done during 20 epochs, followed by 20 additional ones, thus lower loss at the end. Slight overfit though.

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v4.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[11]:
model_save_path = f"models/{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth"
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net, model_save_path)


plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss_val[-1]):5f}")
print(f"model exported to {model_save_path}")
[Epoch   1/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.96it/s, loss=0.00117, val_loss=0.0011]
[Epoch   2/20]: 100%|██████████| 12000/12000 [06:28<00:00, 30.92it/s, loss=0.000983, val_loss=0.000908]
[Epoch   3/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.99it/s, loss=0.000867, val_loss=0.000994]
[Epoch   4/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.96it/s, loss=0.000787, val_loss=0.000831]
[Epoch   5/20]: 100%|██████████| 12000/12000 [06:28<00:00, 30.90it/s, loss=0.000741, val_loss=0.00073]
[Epoch   6/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.94it/s, loss=0.000683, val_loss=0.000808]
[Epoch   7/20]: 100%|██████████| 12000/12000 [06:29<00:00, 30.79it/s, loss=0.000654, val_loss=0.000655]
[Epoch   8/20]: 100%|██████████| 12000/12000 [06:30<00:00, 30.77it/s, loss=0.00061, val_loss=0.000624]
[Epoch   9/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.57it/s, loss=0.000576, val_loss=0.000683]
[Epoch  10/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000551, val_loss=0.000581]
[Epoch  11/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000524, val_loss=0.000569]
[Epoch  12/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000494, val_loss=0.000539]
[Epoch  13/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000474, val_loss=0.000568]
[Epoch  14/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000453, val_loss=0.000567]
[Epoch  15/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.47it/s, loss=0.000439, val_loss=0.000502]
[Epoch  16/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.50it/s, loss=0.00042, val_loss=0.00046]
[Epoch  17/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000404, val_loss=0.00054]
[Epoch  18/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000393, val_loss=0.000458]
[Epoch  19/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.00038, val_loss=0.000438]
[Epoch  20/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.58it/s, loss=0.000372, val_loss=0.000487]
Loss: 0.000487
model exported to models/model_00048.pth
_images/notebooks_trainings_training_2020_01_03__133337_10_2.png

Validation

[12]:
_ = net.eval()
Plot results on a sample
[13]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2020_01_03__133337_14_0.png
Plot encoding attention map
[14]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2020_01_03__133337_16_0.png
Evaluate on the test dataset
[15]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.55it/s]
[16]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2020_01_03__133337_19_0.png

Window - 2019 December 29

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1.5e-4
EPOCHS = 20

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (9000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss[-1]):5f}")

model_path = f"models/model_{str(hist_loss[-1]).split('.')[-1][:5]}.pth"
torch.save(net, model_path)
print(f"model exported to {model_path}")
[Epoch   1/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.0139, val_loss=0.00843]
[Epoch   2/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00662, val_loss=0.00666]
[Epoch   3/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00546, val_loss=0.00491]
[Epoch   4/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00466, val_loss=0.00417]
[Epoch   5/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.004, val_loss=0.00384]
[Epoch   6/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00327, val_loss=0.00319]
[Epoch   7/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.78it/s, loss=0.00279, val_loss=0.00291]
[Epoch   8/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00241, val_loss=0.00226]
[Epoch   9/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00217, val_loss=0.00201]
[Epoch  10/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00201, val_loss=0.00207]
[Epoch  11/20]: 100%|██████████| 9000/9000 [04:53<00:00, 30.62it/s, loss=0.00185, val_loss=0.0021]
[Epoch  12/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00176, val_loss=0.00162]
[Epoch  13/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00166, val_loss=0.00161]
[Epoch  14/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00157, val_loss=0.00163]
[Epoch  15/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.0015, val_loss=0.00149]
[Epoch  16/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00145, val_loss=0.00139]
[Epoch  17/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.80it/s, loss=0.00139, val_loss=0.00135]
[Epoch  18/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00133, val_loss=0.00127]
[Epoch  19/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00129, val_loss=0.00135]
[Epoch  20/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00122, val_loss=0.00124]
Loss: 0.001224
model exported to models/model_00122.pth
_images/notebooks_trainings_training_2019_12_29__143613_9_2.png

Validation

[7]:
_ = net.eval()
Plot results on a sample
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_29__143613_13_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_29__143613_15_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.21it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_29__143613_18_0.png

Window - 2019 December 28

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (9000, 500, 500))

dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})

        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss

plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.48it/s, loss=0.0139, val_loss=0.00883]
[Epoch   2/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.39it/s, loss=0.00705, val_loss=0.00596]
[Epoch   3/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.41it/s, loss=0.00577, val_loss=0.005]
[Epoch   4/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.43it/s, loss=0.00506, val_loss=0.00454]
[Epoch   5/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00454, val_loss=0.00409]
[Epoch   6/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.41it/s, loss=0.00411, val_loss=0.00378]
[Epoch   7/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.39it/s, loss=0.0037, val_loss=0.00326]
[Epoch   8/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.40it/s, loss=0.00325, val_loss=0.00312]
[Epoch   9/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00293, val_loss=0.00254]
[Epoch  10/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.40it/s, loss=0.00257, val_loss=0.00245]
[Epoch  11/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.37it/s, loss=0.00239, val_loss=0.00228]
[Epoch  12/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00224, val_loss=0.00229]
[Epoch  13/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00206, val_loss=0.00191]
[Epoch  14/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.41it/s, loss=0.002, val_loss=0.00203]
[Epoch  15/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00186, val_loss=0.00177]
[Epoch  16/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.46it/s, loss=0.00179, val_loss=0.00167]
[Epoch  17/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.48it/s, loss=0.00169, val_loss=0.00184]
[Epoch  18/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.45it/s, loss=0.00163, val_loss=0.00162]
[Epoch  19/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.43it/s, loss=0.00157, val_loss=0.00153]
[Epoch  20/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00153, val_loss=0.00151]
Loss: 0.001529
_images/notebooks_trainings_training_2019_12_28__174445_9_2.png

Validation

[6]:
_ = net.eval()
Plot results on a sample
[12]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_28__174445_13_0.png
Plot encoding attention map
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_28__174445_15_0.png
Evaluate on the test dataset
[16]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.95it/s]
[17]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()

for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = y_true_full[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
    y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_28__174445_18_0.png

Window - 2019 December 28

This is the first training on the CAPTrocadero dataset (9500 - 500 samples)

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_train.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_CAPTrocadero_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 9500/9500 [05:05<00:00, 31.07it/s, loss=0.012]
[Epoch   2/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.50it/s, loss=0.00476]
[Epoch   3/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.00341]
[Epoch   4/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00235]
[Epoch   5/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.56it/s, loss=0.0019]
[Epoch   6/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.55it/s, loss=0.00164]
[Epoch   7/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00149]
[Epoch   8/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00138]
[Epoch   9/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.57it/s, loss=0.00127]
[Epoch  10/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.47it/s, loss=0.00117]
[Epoch  11/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.46it/s, loss=0.0011]
[Epoch  12/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.46it/s, loss=0.00101]
[Epoch  13/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000917]
[Epoch  14/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.48it/s, loss=0.000852]
[Epoch  15/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.43it/s, loss=0.000806]
[Epoch  16/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.37it/s, loss=0.000765]
[Epoch  17/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000713]
[Epoch  18/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.36it/s, loss=0.000667]
[Epoch  19/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.40it/s, loss=0.000657]
[Epoch  20/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.52it/s, loss=0.000589]
Loss: 0.000589
_images/notebooks_trainings_training_2019_12_28__110648_10_2.png

Validation

Load dataset and network
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
Plot results on a sample
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_28__110648_15_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_28__110648_17_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.06it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = datatestloader.dataset.rescale(y_true, idx_label)
    y_pred = datatestloader.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_28__110648_20_0.png

Window - 2019 December 25

This is the first training with the 3rd version of the dataset, containing 9500 training samples and 500 test samples.

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_train.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 39 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 9500/9500 [05:05<00:00, 31.09it/s, loss=0.00938]
[Epoch   2/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00452]
[Epoch   3/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.0029]
[Epoch   4/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00209]
[Epoch   5/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.00172]
[Epoch   6/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.67it/s, loss=0.00151]
[Epoch   7/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.66it/s, loss=0.00133]
[Epoch   8/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00118]
[Epoch   9/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.66it/s, loss=0.00108]
[Epoch  10/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.57it/s, loss=0.000912]
[Epoch  11/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.55it/s, loss=0.000802]
[Epoch  12/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.65it/s, loss=0.00073]
[Epoch  13/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.68it/s, loss=0.00066]
[Epoch  14/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000638]
[Epoch  15/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.58it/s, loss=0.000614]
[Epoch  16/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.63it/s, loss=0.000549]
[Epoch  17/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.000542]
[Epoch  18/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.000486]
[Epoch  19/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.70it/s, loss=0.00046]
[Epoch  20/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.60it/s, loss=0.00044]
Loss: 0.000440
_images/notebooks_trainings_training_2019_12_25__114022_10_2.png

Validation

Load dataset and network
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_25__114022_16_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_25__114022_18_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.29it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = datatestloader.dataset.rescale(y_true, idx_label)
    y_pred = datatestloader.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_25__114022_21_0.png

Window - 2019 December 24

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 7500/7500 [04:02<00:00, 30.89it/s, loss=0.0104]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00525]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00393]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00283]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00226]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00193]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00178]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.0016]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00152]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.63it/s, loss=0.00143]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00125]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00119]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00109]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.65it/s, loss=0.00103]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.000997]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:58<00:00, 31.49it/s, loss=0.000923]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.000876]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.000875]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.000811]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.000808]
Loss: 0.000808
_images/notebooks_trainings_training_2019_12_24__132610_9_2.png

Validation

Load dataset and network
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_24__132610_15_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_24__132610_17_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.57it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = datatestloader.dataset.rescale(y_true, idx_label)
    y_pred = datatestloader.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_24__132610_20_0.png

Chunk - 2019 December 23

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 7500/7500 [04:03<00:00, 30.75it/s, loss=0.011]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00566]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.59it/s, loss=0.00474]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00425]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.59it/s, loss=0.00371]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.57it/s, loss=0.00327]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.00288]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00262]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.67it/s, loss=0.00236]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00217]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00199]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.55it/s, loss=0.00185]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.53it/s, loss=0.00173]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.55it/s, loss=0.00161]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.67it/s, loss=0.00152]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:59<00:00, 31.36it/s, loss=0.00147]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.0014]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00135]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00128]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.00123]
Loss: 0.001232
_images/notebooks_trainings_training_2019_12_23__194258_9_2.png

Validation

Load dataset and network
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_23__194258_15_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_23__194258_17_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 21.10it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = datatestloader.dataset.rescale(y_true, idx_label)
    y_pred = datatestloader.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_23__194258_20_0.png

Chunk - 2019 December 23

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-4
EPOCHS = 20
TIME_CHUNK = True

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0127]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.42it/s, loss=0.00693]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00605]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.03it/s, loss=0.00541]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.94it/s, loss=0.00508]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00466]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.32it/s, loss=0.00428]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:11<00:00, 39.22it/s, loss=0.00394]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.43it/s, loss=0.00372]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.66it/s, loss=0.00344]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.52it/s, loss=0.00331]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0031]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.75it/s, loss=0.00293]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.80it/s, loss=0.00283]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.09it/s, loss=0.00269]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.07it/s, loss=0.0026]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.24it/s, loss=0.00246]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.13it/s, loss=0.00238]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.03it/s, loss=0.00227]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.85it/s, loss=0.0022]
Loss: 0.002201
_images/notebooks_trainings_training_2019_12_23__173446_9_2.png

Validation

Load dataset and network
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_23__173446_15_0.png
Plot encoding attention map
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_23__173446_17_0.png
Evaluate on the test dataset
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.60it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
    # Select output to plot
    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    # Rescale
    y_true = datatestloader.dataset.rescale(y_true, idx_label)
    y_pred = datatestloader.dataset.rescale(y_pred, idx_label)

    # Compute delta, mean and std
    delta = np.abs(y_true - y_pred)

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    # Plot
    # Labels for consumption and temperature
    if label.startswith('Q_'):
        y_label_unit = 'kW'
    else:
        y_label_unit = '°C'

    # Occupancy
    occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
    for idx in range(0, len(occupancy_idxes), 2):
        ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)

    # Std
    ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')

    # Mean
    ax.plot(mean, label='mean')

    # Title and labels
    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)
    ax.legend()

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_23__173446_20_0.png

Chunk - 2019 December 20

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)

temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            y = y.to(device)

            delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
            delta_T = temperature_loss_function(netout[..., -1], y[..., -1])

            loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 7500/7500 [03:14<00:00, 38.49it/s, loss=0.0112]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0059]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.68it/s, loss=0.00506]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.00453]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.60it/s, loss=0.00391]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.00361]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.0033]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.73it/s, loss=0.00316]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.00296]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.0028]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00265]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.66it/s, loss=0.00252]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.70it/s, loss=0.00238]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.75it/s, loss=0.00223]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.77it/s, loss=0.00214]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.64it/s, loss=0.00201]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.63it/s, loss=0.00191]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00186]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00174]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.65it/s, loss=0.00168]
Loss: 0.001683
_images/notebooks_trainings_training_2019_12_20__172758_9_2.png

Validation

Load dataset and network
[11]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
Plot results on a sample
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_20__172758_14_0.png
Plot encoding attention map
[13]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_20__172758_16_0.png
Evaluate on the test dataset
[14]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.89it/s]
[15]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):

    y_true = datatestloader.dataset._y.numpy()[..., idx_label]
    y_pred = predictions[..., idx_label]

    y_true = dataloader.dataset.rescale(y_true, idx_label)
    y_pred = dataloader.dataset.rescale(y_pred, idx_label)

    delta = np.square(y_true - y_pred)

    # For consumption
    if label.startswith('Q_'):
        y_label_unit = 'kWh'
    else:
        y_label_unit = '°C'

    mean = delta.mean(axis=0)
    std = delta.std(axis=0)

    ax.fill_between(np.arange(K), (mean - 3 * std), (mean + 3 * std), alpha=.3)
    ax.plot(mean)

    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_20__172758_19_0.png

Chunk - 2019 December 20

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True

# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
TEST_MODEL_PATH = 'models/model_00251.pth'

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Training

Load dataset
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )
Load network
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(netout, y.to(device))

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch   1/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.91it/s, loss=0.0145]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.63it/s, loss=0.00864]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.94it/s, loss=0.00674]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.96it/s, loss=0.0059]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00518]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.88it/s, loss=0.00459]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.88it/s, loss=0.00422]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.85it/s, loss=0.00393]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00369]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.95it/s, loss=0.00347]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00331]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00318]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.93it/s, loss=0.00302]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.72it/s, loss=0.00293]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00284]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.91it/s, loss=0.00276]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.00267]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.61it/s, loss=0.00262]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00259]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.62it/s, loss=0.00251]
Loss: 0.002514
_images/notebooks_trainings_training_2019_12_20__112013_9_2.png

Validation

Load dataset and network
[3]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
                            batch_size=BATCH_SIZE,
                            shuffle=False,
                            num_workers=NUM_WORKERS
                           )
net = torch.load(TEST_MODEL_PATH, map_location=device)
Plot results on a sample
[4]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_20__112013_14_0.png
Plot encoding attention map
[5]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_20__112013_16_0.png
Evaluate on the test dataset
[6]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))

idx_prediction = 0
with torch.no_grad():
    for x, y in tqdm(datatestloader, total=len(datatestloader)):
        netout = net(x.to(device)).cpu().numpy()
        predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
        idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.52it/s]
[7]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)

delta = np.square(predictions - datatestloader.dataset._y.numpy())

for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):

    input_data = delta[..., idx_label]

    # For consumption
    if label.startswith('Q_'):
        y_label_unit = 'kWh'
    else:
        y_label_unit = '°C'

    mean = input_data.mean(axis=0)
    std = input_data.std(axis=0)

    ax.fill_between(np.arange(K), (mean - 3 * std), (mean + 3 * std), alpha=.3)
    ax.plot(mean)

    ax.set_title(label)
    ax.set_xlabel('time', fontsize=16)
    ax.set_ylabel(y_label_unit, fontsize=16)

plt.savefig('error_mean_std.jpg')
_images/notebooks_trainings_training_2019_12_20__112013_19_0.png

Chunk - 2019 December 15

This training was performed without the decoder part of the Transformer, dividing training time by a factor 2.

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-4
EPOCHS = 20
TIME_CHUNK = True

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Load dataset

[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(netout, y.to(device))

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.04it/s, loss=0.0126]
[Epoch   2/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.04it/s, loss=0.00866]
[Epoch   3/20]: 100%|██████████| 7500/7500 [01:23<00:00, 89.89it/s, loss=0.00733]
[Epoch   4/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.20it/s, loss=0.00669]
[Epoch   5/20]: 100%|██████████| 7500/7500 [01:23<00:00, 90.16it/s, loss=0.00609]
[Epoch   6/20]: 100%|██████████| 7500/7500 [01:23<00:00, 90.12it/s, loss=0.00564]
[Epoch   7/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.97it/s, loss=0.00522]
[Epoch   8/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.62it/s, loss=0.00486]
[Epoch   9/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.81it/s, loss=0.00454]
[Epoch  10/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.81it/s, loss=0.0043]
[Epoch  11/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.53it/s, loss=0.00406]
[Epoch  12/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.67it/s, loss=0.00387]
[Epoch  13/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.37it/s, loss=0.00367]
[Epoch  14/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.58it/s, loss=0.0035]
[Epoch  15/20]: 100%|██████████| 7500/7500 [01:21<00:00, 92.01it/s, loss=0.00335]
[Epoch  16/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.18it/s, loss=0.00322]
[Epoch  17/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.49it/s, loss=0.00312]
[Epoch  18/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.95it/s, loss=0.00303]
[Epoch  19/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.58it/s, loss=0.00294]
[Epoch  20/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.80it/s, loss=0.00284]
Loss: 0.002845
_images/notebooks_trainings_training_2019_12_15__172952_9_2.png

Plot results sample

[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()

plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
    # Select real temperature
    y_true = y[:, idx_label]
    y_pred = netout[0, :, idx_label].numpy()


    plt.subplot(9, 1, idx_label+1)



    # If consumption, rescale axis
    if label.startswith('Q_'):
        plt.ylim(-0.1, 1.1)
    elif label == 'T_INT_OFFICE':
        y_true = dataloader.dataset.rescale(y_true, idx_label)
        y_pred = dataloader.dataset.rescale(y_pred, idx_label)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(label)
    plt.legend()


# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()

plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_15__172952_11_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_15__172952_13_0.png

Chunk - 2019 December 15

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Load dataset

[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(netout, y.to(device))

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.45it/s, loss=0.0155]
[Epoch   2/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.01it/s, loss=0.00893]
[Epoch   3/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.13it/s, loss=0.00693]
[Epoch   4/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.15it/s, loss=0.00596]
[Epoch   5/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00526]
[Epoch   6/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00458]
[Epoch   7/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00416]
[Epoch   8/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00386]
[Epoch   9/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00359]
[Epoch  10/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00338]
[Epoch  11/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.0032]
[Epoch  12/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.41it/s, loss=0.00305]
[Epoch  13/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.46it/s, loss=0.00293]
[Epoch  14/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.42it/s, loss=0.00282]
[Epoch  15/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.48it/s, loss=0.00277]
[Epoch  16/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.34it/s, loss=0.00268]
[Epoch  17/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00264]
[Epoch  18/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00258]
[Epoch  19/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00254]
[Epoch  20/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.43it/s, loss=0.00247]
Loss: 0.002473
_images/notebooks_trainings_training_2019_12_15__164718_8_2.png

Plot results sample

[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()

plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
    # Select real temperature
    y_true = y[:, idx_label]
    y_pred = netout[0, :, idx_label].numpy()


    plt.subplot(9, 1, idx_label+1)



    # If consumption, rescale axis
    if label.startswith('Q_'):
        plt.ylim(-0.1, 1.1)
    elif label == 'T_INT_OFFICE':
        y_true = dataloader.dataset.rescale(y_true, idx_label)
        y_pred = dataloader.dataset.rescale(y_pred, idx_label)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(label)
    plt.legend()


# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()

plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_15__164718_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0].cpu()

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_15__164718_12_0.png

Benchmark - 2019 December 15

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns

from src.dataset import OzeDataset
from src.Benchmark import LSTM
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-3
EPOCHS = 20
TIME_CHUNK = True

# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0

Load dataset

[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(netout, y.to(device))

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/20]: 100%|██████████| 7500/7500 [00:16<00:00, 452.91it/s, loss=0.0142]
[Epoch   2/20]: 100%|██████████| 7500/7500 [00:16<00:00, 449.53it/s, loss=0.00813]
[Epoch   3/20]: 100%|██████████| 7500/7500 [00:17<00:00, 434.53it/s, loss=0.00724]
[Epoch   4/20]: 100%|██████████| 7500/7500 [00:16<00:00, 448.52it/s, loss=0.00693]
[Epoch   5/20]: 100%|██████████| 7500/7500 [00:16<00:00, 451.65it/s, loss=0.00671]
[Epoch   6/20]: 100%|██████████| 7500/7500 [00:16<00:00, 455.15it/s, loss=0.00653]
[Epoch   7/20]: 100%|██████████| 7500/7500 [00:17<00:00, 425.80it/s, loss=0.0064]
[Epoch   8/20]: 100%|██████████| 7500/7500 [00:17<00:00, 423.33it/s, loss=0.00628]
[Epoch   9/20]: 100%|██████████| 7500/7500 [00:17<00:00, 432.92it/s, loss=0.0062]
[Epoch  10/20]: 100%|██████████| 7500/7500 [00:17<00:00, 438.34it/s, loss=0.00606]
[Epoch  11/20]: 100%|██████████| 7500/7500 [00:17<00:00, 422.91it/s, loss=0.00595]
[Epoch  12/20]: 100%|██████████| 7500/7500 [00:17<00:00, 421.01it/s, loss=0.00583]
[Epoch  13/20]: 100%|██████████| 7500/7500 [00:16<00:00, 447.78it/s, loss=0.0057]
[Epoch  14/20]: 100%|██████████| 7500/7500 [00:17<00:00, 440.90it/s, loss=0.0055]
[Epoch  15/20]: 100%|██████████| 7500/7500 [00:16<00:00, 454.46it/s, loss=0.00538]
[Epoch  16/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.71it/s, loss=0.00524]
[Epoch  17/20]: 100%|██████████| 7500/7500 [00:16<00:00, 457.21it/s, loss=0.00516]
[Epoch  18/20]: 100%|██████████| 7500/7500 [00:16<00:00, 457.11it/s, loss=0.00507]
[Epoch  19/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.00it/s, loss=0.00499]
[Epoch  20/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.07it/s, loss=0.00488]
Loss: 0.004880
_images/notebooks_trainings_training_2019_12_15__152700_8_2.png

Plot results sample

[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()

plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
    # Select real temperature
    y_true = y[:, idx_label]
    y_pred = netout[0, :, idx_label].numpy()


    plt.subplot(9, 1, idx_label+1)



    # If consumption, rescale axis
    if label.startswith('Q_'):
        plt.ylim(-0.1, 1.1)
    elif label == 'T_INT_OFFICE':
        y_true = dataloader.dataset.rescale(y_true, idx_label)
        y_pred = dataloader.dataset.rescale(y_pred, idx_label)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(label)
    plt.legend()


# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()

plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_15__152700_10_0.png

Chunk - 2019 December 06

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
DATASET_PATH = 'dataset.npz'
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = 'regular' # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [03:33<00:00,  4.68it/s, loss=0.0178]
[Epoch   2/5]: 100%|██████████| 1000/1000 [03:30<00:00,  4.74it/s, loss=0.0125]
[Epoch   3/5]: 100%|██████████| 1000/1000 [03:36<00:00,  4.62it/s, loss=0.0116]
[Epoch   4/5]: 100%|██████████| 1000/1000 [03:50<00:00,  4.34it/s, loss=0.0112]
[Epoch   5/5]: 100%|██████████| 1000/1000 [03:13<00:00,  5.16it/s, loss=0.0108]
Loss: 0.010834
_images/notebooks_trainings_training_2019_12_06__123805_8_2.png

Plot results sample

[8]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    netout = net(torch.Tensor(x[np.newaxis, ...]))

plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
    # Select real temperature
    y_true = y[:, idx_label]
    y_pred = netout[0, :, idx_label].numpy()


    plt.subplot(9, 1, idx_label+1)



    # If consumption, rescale axis
    if label.startswith('Q_'):
        plt.ylim(-0.1, 1.1)
    elif label == 'T_INT_OFFICE':
        y_true = dataloader.dataset.rescale(y_true, idx_label)
        y_pred = dataloader.dataset.rescale(y_pred, idx_label)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(label)
    plt.legend()


# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()

plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_06__123805_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0]

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_06__123805_12_0.png

Chunk - 2019 December 06

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
DATASET_PATH = 'dataset.npz'
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [03:40<00:00,  4.53it/s, loss=0.0165]
[Epoch   2/5]: 100%|██████████| 1000/1000 [03:54<00:00,  4.26it/s, loss=0.012]
[Epoch   3/5]: 100%|██████████| 1000/1000 [03:50<00:00,  4.34it/s, loss=0.0116]
[Epoch   4/5]: 100%|██████████| 1000/1000 [03:46<00:00,  4.42it/s, loss=0.011]
[Epoch   5/5]: 100%|██████████| 1000/1000 [03:46<00:00,  4.41it/s, loss=0.0109]
Loss: 0.010939
_images/notebooks_trainings_training_2019_12_06__114703_8_2.png

Plot results sample

[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    x = torch.Tensor(x[np.newaxis, ...])
    netout = net(x)

plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
    # Select real temperature
    y_true = y[:, idx_output_var]

    y_pred = netout[0, :, idx_output_var]
    y_pred = y_pred.numpy()

    plt.subplot(8, 1, idx_output_var+1)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_06__114703_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0]

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_06__114703_12_0.png

Chunk - 2019 December 04

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = "regular" # Positional encoding

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [03:45<00:00,  4.44it/s, loss=0.0183]
[Epoch   2/5]: 100%|██████████| 1000/1000 [02:58<00:00,  5.60it/s, loss=0.0115]
[Epoch   3/5]: 100%|██████████| 1000/1000 [03:00<00:00,  5.55it/s, loss=0.0108]
[Epoch   4/5]: 100%|██████████| 1000/1000 [02:58<00:00,  5.59it/s, loss=0.0102]
[Epoch   5/5]: 100%|██████████| 1000/1000 [02:57<00:00,  5.63it/s, loss=0.0102]
Loss: 0.010186
_images/notebooks_trainings_training_2019_12_04__132557_8_2.png

Plot results sample

[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    x = torch.Tensor(x[np.newaxis, ...])
    netout = net(x)

plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
    # Select real temperature
    y_true = y[:, idx_output_var]

    y_pred = netout[0, :, idx_output_var]
    y_pred = y_pred.numpy()

    plt.subplot(8, 1, idx_output_var+1)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_04__132557_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0]

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_04__132557_12_0.png

Benchmark - 2019 December 03

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.LSTM import LSTMBenchmark
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTMBenchmark(input_dim=d_input, hidden_dim=d_model, output_dim=d_output, num_layers=N)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [00:16<00:00, 62.40it/s, loss=0.0218]
[Epoch   2/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.13it/s, loss=0.0145]
[Epoch   3/5]: 100%|██████████| 1000/1000 [00:14<00:00, 68.86it/s, loss=0.0132]
[Epoch   4/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.17it/s, loss=0.0107]
[Epoch   5/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.18it/s, loss=0.0103]
Loss: 0.010313
_images/notebooks_trainings_training_2019_12_03__173205_8_2.png

Plot results sample

[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    x = torch.Tensor(x[np.newaxis, ...])
    netout = net(x)

plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
    # Select real temperature
    y_true = y[:, idx_output_var]

    y_pred = netout[0, :, idx_output_var]
    y_pred = y_pred.numpy()

    plt.subplot(8, 1, idx_output_var+1)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_03__173205_10_0.png

Chunk - 2019 December 03

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [03:05<00:00,  5.39it/s, loss=0.0205]
[Epoch   2/5]: 100%|██████████| 1000/1000 [03:00<00:00,  5.55it/s, loss=0.012]
[Epoch   3/5]: 100%|██████████| 1000/1000 [02:59<00:00,  5.56it/s, loss=0.0108]
[Epoch   4/5]: 100%|██████████| 1000/1000 [03:00<00:00,  5.55it/s, loss=0.0105]
[Epoch   5/5]: 100%|██████████| 1000/1000 [02:59<00:00,  5.57it/s, loss=0.0102]
Loss: 0.010207
_images/notebooks_trainings_training_2019_12_03__172753_8_2.png

Plot results sample

[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    x = torch.Tensor(x[np.newaxis, ...])
    netout = net(x)

plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
    # Select real temperature
    y_true = y[:, idx_output_var]

    y_pred = netout[0, :, idx_output_var]
    y_pred = y_pred.numpy()

    plt.subplot(8, 1, idx_output_var+1)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_03__172753_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0]

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_03__172753_12_0.png

Classic - 2019 December 03

[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()

from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = False

K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack

d_input = 37 # From dataset
d_output = 8 # From dataset

Load dataset

[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                       )

Load network

[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

Train

[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch   1/5]: 100%|██████████| 1000/1000 [05:35<00:00,  2.98it/s, loss=0.019]
[Epoch   2/5]: 100%|██████████| 1000/1000 [05:55<00:00,  2.81it/s, loss=0.0126]
[Epoch   3/5]: 100%|██████████| 1000/1000 [05:33<00:00,  3.00it/s, loss=0.0115]
[Epoch   4/5]: 100%|██████████| 1000/1000 [05:23<00:00,  3.09it/s, loss=0.0108]
[Epoch   5/5]: 100%|██████████| 1000/1000 [05:21<00:00,  3.11it/s, loss=0.0103]
Loss: 0.010339
_images/notebooks_trainings_training_2019_12_03__170100_8_2.png

Plot results sample

[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]

# Run predictions
with torch.no_grad():
    x = torch.Tensor(x[np.newaxis, ...])
    netout = net(x)

plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
    # Select real temperature
    y_true = y[:, idx_output_var]

    y_pred = netout[0, :, idx_output_var]
    y_pred = y_pred.numpy()

    plt.subplot(8, 1, idx_output_var+1)

    plt.plot(y_true, label="Truth")
    plt.plot(y_pred, label="Prediction")
    plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")
_images/notebooks_trainings_training_2019_12_03__170100_10_0.png

Display encoding attention map

[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]

# Get the first attention map
attn_map = encoder.attention_map[0]

# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
_images/notebooks_trainings_training_2019_12_03__170100_12_0.png