Transformer for metamodels¶
Transformers for Time Series¶
Documentation Status
License: GPL v3
Latest release
Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series (Powered by PyTorch).
Transformer model¶
Transformer are attention based neural networks designed to solve NLP tasks. Their key features are:
linear complexity in the dimension of the feature vector ;
paralellisation of computing of a sequence, as opposed to sequential computing ;
long term memory, as we can look at any input time sequence step directly.
This repo will focus on their application to times series.
Dataset and application as metamodel¶
Our use-case is modeling a numerical simulator for building consumption prediction. To this end, we created a dataset by sampling random inputs (building characteristics and usage, weather, …) and got simulated outputs. We then convert these variables in time series format, and feed it to the transformer.
Adaptations for time series¶
In order to perform well on time series, a few adjustments had to be made:
The embedding layer is replaced by a generic linear layer ;
Original positional encoding are removed. A “regular” version, better matching the input sequence day/night patterns, can be used instead ;
A window is applied on the attention map to limit backward attention, and focus on short term patterns.
Installation¶
All required packages can be found in requirements.txt
, and expect to be run with python3.7
. Note that you may have to install pytorch manually if you are not using pip with a Debian distribution : head on to PyTorch installation page. Here are a few lines to get started with pip and virtualenv:
$ apt-get install python3.7
$ pip3 install --upgrade --user pip virtualenv
$ virtualenv -p python3.7 .env
$ . .env/bin/activate
(.env) $ pip install -r requirements.txt
Usage¶
Downloading the dataset¶
The dataset is not included in this repo, and must be downloaded manually. It is comprised of two files, dataset.npz
contains all input and outputs value, labels.json
is a detailed list of the variables. Please refer to #2 for more information.
Running training script¶
Using jupyter, run the default training.ipynb
notebook. All adjustable parameters can be found in the second cell. Careful with the BATCH_SIZE
, as we are using it to parallelize head and time chunk calculations.
Outside usage¶
The Transformer
class can be used out of the box, see the docs for more info.
from tst import Transformer
net = Transformer(d_input, d_model, d_output, q, v, h, N, TIME_CHUNK, pe)
Building the docs¶
To build the doc:
(.env) $ cd docs && make html
Modules¶
Transformer module¶
- class transformer.Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=None, dropout=0.3, chunk_mode='chunk', pe=None, pe_period=None)¶
Bases:
Module
Transformer model from Attention is All You Need.
A classic transformer model adapted for sequential data. Embedding has been replaced with a fully connected layer, the last layer softmax is now a sigmoid.
- Variables
- Parameters
d_input (
int
) – Model input dimension.d_model (
int
) – Dimension of the input vector.d_output (
int
) – Model output dimension.q (
int
) – Dimension of queries and keys.v (
int
) – Dimension of values.h (
int
) – Number of heads.N (
int
) – Number of encoder and decoder layers to stack.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Switch between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.pe (
Optional
[str
]) – Type of positional encoding to add. Must be one of'original'
,'regular'
orNone
. Default isNone
.pe_period (
Optional
[int
]) – If using the'regular'` pe, then we can define the period. Default is ``None
.
- forward(x)¶
Propagate input through transformer
Forward input through an embedding module, the encoder then decoder stacks, and an output module.
- Parameters
x (
Tensor
) –torch.Tensor
of shape (batch_size, K, d_input).- Return type
- Returns
Output tensor with shape (batch_size, K, d_output).
Encoder module¶
- class encoder.Encoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')¶
Bases:
Module
Encoder block from Attention is All You Need.
Apply Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Swict between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.
- property attention_map: Tensor¶
Attention map after a forward propagation, variable score in the original paper.
- forward(x)¶
Propagate the input through the Encoder block.
Apply the Multi Head Attention block, add residual and normalize. Apply the Point-wise Feed Forward block, add residual and normalize.
Decoder module¶
- class decoder.Decoder(d_model, q, v, h, attention_size=None, dropout=0.3, chunk_mode='chunk')¶
Bases:
Module
Decoder block from Attention is All You Need.
Apply two Multi Head Attention block followed by a Point-wise Feed Forward block. Residual sum and normalization are applied at each step.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.dropout (
float
) – Dropout probability after each MHA or PFF block. Default is0.3
.chunk_mode (
str
) – Swict between different MultiHeadAttention blocks. One of'chunk'
,'window'
orNone
. Default is'chunk'
.
- forward(x, memory)¶
Propagate the input through the Decoder block.
Apply the self attention block, add residual and normalize. Apply the encoder-decoder attention block, add residual and normalize. Apply the feed forward network, add residual and normalize.
MultiHeadAttention module¶
- class multiHeadAttention.MultiHeadAttention(d_model, q, v, h, attention_size=None)¶
Bases:
Module
Multi Head Attention block from Attention is All You Need.
Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model).
- Parameters
- property attention_map: Tensor¶
Attention map after a forward propagation, variable score in the original paper.
- forward(query, key, value, mask=None)¶
Propagate forward the input through the MHB.
We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).
- Parameters
query (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute queries.key (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute keys.value (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute values.mask (
Optional
[str
]) – Mask to apply on scores before computing attention. One of'subsequent'
, None. Default is None.
- Return type
- Returns
Self attention tensor with shape (batch_size, K, d_model).
- class multiHeadAttention.MultiHeadAttentionChunk(d_model, q, v, h, attention_size=None, chunk_size=168, **kwargs)¶
Bases:
MultiHeadAttention
Multi Head Attention block with chunk.
Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model). Queries, keys and values are divided in chunks of constant size.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.chunk_size (
Optional
[int
]) – Size of chunks to apply attention on. Last one may be smaller (seetorch.Tensor.chunk
). Default is 168.
- forward(query, key, value, mask=None)¶
Propagate forward the input through the MHB.
We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).
- Parameters
query (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute queries.key (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute keys.value (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute values.mask (
Optional
[str
]) – Mask to apply on scores before computing attention. One of'subsequent'
, None. Default is None.
- Return type
- Returns
Self attention tensor with shape (batch_size, K, d_model).
- class multiHeadAttention.MultiHeadAttentionWindow(d_model, q, v, h, attention_size=None, window_size=168, padding=42, **kwargs)¶
Bases:
MultiHeadAttention
Multi Head Attention block with moving window.
Given 3 inputs of shape (batch_size, K, d_model), that will be used to compute query, keys and values, we output a self attention tensor of shape (batch_size, K, d_model). Queries, keys and values are divided in chunks using a moving window.
- Parameters
d_model (
int
) – Dimension of the input vector.q (
int
) – Dimension of all query matrix.v (
int
) – Dimension of all value matrix.h (
int
) – Number of heads.attention_size (
Optional
[int
]) – Number of backward elements to apply attention. Deactivated ifNone
. Default isNone
.window_size (
Optional
[int
]) – Size of the window used to extract chunks. Default is 168padding (
Optional
[int
]) – Padding around each window. Padding will be applied to input sequence. Default is 168 // 4 = 42.
- forward(query, key, value, mask=None)¶
Propagate forward the input through the MHB.
We compute for each head the queries, keys and values matrices, followed by the Scaled Dot-Product. The result is concatenated and returned with shape (batch_size, K, d_model).
- Parameters
query (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute queries.key (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute keys.value (
Tensor
) – Input tensor with shape (batch_size, K, d_model) used to compute values.mask (
Optional
[str
]) – Mask to apply on scores before computing attention. One of'subsequent'
, None. Default is None.
- Return type
- Returns
Self attention tensor with shape (batch_size, K, d_model).
PositionwiseFeedForward module¶
- class positionwiseFeedForward.PositionwiseFeedForward(d_model, d_ff=2048)¶
Bases:
Module
Position-wise Feed Forward Network block from Attention is All You Need.
Apply two linear transformations to each input, separately but indetically. We implement them as 1D convolutions. Input and output have a shape (batch_size, d_model).
- Parameters
- forward(x)¶
Propagate forward the input through the PFF block.
Apply the first linear transformation, then a relu actvation, and the second linear transformation.
Loss module¶
- class loss.OZELoss(reduction='mean', alpha=0.3)¶
Bases:
Module
Custom loss for TRNSys metamodel.
Compute, for temperature and consumptions, the intergral of the squared differences over time. Sum the log with a coeficient
alpha
.\[ \begin{align}\begin{aligned}\Delta_T = \sqrt{\int (y_{est}^T - y^T)^2}\\\Delta_Q = \sqrt{\int (y_{est}^Q - y^Q)^2}\\loss = log(1 + \Delta_T) + \alpha \cdot log(1 + \Delta_Q)\end{aligned}\end{align} \]Parameters:¶
- alpha:
Coefficient for consumption. Default is
0.3
.
Utils module¶
- utils.generate_local_map_mask(chunk_size, attention_size, mask_future=False, device='cpu')¶
Compute attention mask as attention_size wide diagonal.
- utils.generate_original_PE(length, d_model)¶
Generate positional encoding as described in original paper.
torch.Tensor
- utils.generate_regular_PE(length, d_model, period=24)¶
Generate positional encoding with a given period.
Visualizations¶
Training visualization - 2021 March 28¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
DATASET_PATH = 'datasets/dataset_sample_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataloader = DataLoader(ozeDataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS,
pin_memory=False
)
Load network¶
[4]:
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
net.load_state_dict(torch.load('models/model_2020_03_10__231146.pth'))
_ = net.eval()
Evaluate on the test dataset¶
[5]:
predictions = np.empty(shape=(len(dataloader.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader, total=len(dataloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 18.72it/s]
Plot results on a sample¶
[6]:
map_plot_function(ozeDataset, predictions, plot_visual_sample)

Plot encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Plot dataset and prediction distributions for consumptions¶
[8]:
map_plot_function(ozeDataset, predictions, plot_values_distribution, time_limit=24, labels=['Q_AC_OFFICE',
'Q_HEAT_OFFICE',
'Q_PEOPLE',
'Q_EQP',
'Q_LIGHT',
'Q_AHU_C',
'Q_AHU_H'])

Plot error distribution for temperature¶
[9]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, labels=['T_INT_OFFICE'], time_limit=24)

Plot mispredictions thresholds¶
[10]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1})

Demonstrateur¶
[1]:
import json
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML
from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation
%matplotlib notebook
# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()
Température intérieure médiane¶
[2]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation().to_jshtml())
[2]:
Consommation totale¶
[3]:
animation = HistAnimation.load('animations/elec.anim')
HTML(animation.run_animation().to_jshtml())
[3]:
Erreur cumulée¶
Gestion de l’intermittence¶
Optimisation de l’utilisation de Climespace¶
Optimisation de la programmation des Centrales de Traitement d’Air¶
Feuille de Route, programmation saisonnière¶
Consommations d’énergie thermique¶
Demonstrateur¶
[1]:
import json
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML
from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation
%matplotlib notebook
# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()
Données de contexte : Emetteurs¶
Consommations d’énergie thermique¶
Optimisation de l’utilisation de Climespace¶
Optimisation de la programmation des Centrales de Traitement d’Air¶
Température intérieure¶
[2]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation().to_jshtml())
[2]:
Consommation Privées¶
[3]:
animation = HistAnimation.load('animations/elec.anim')
HTML(animation.run_animation().to_jshtml())
[3]:
Demonstrateur¶
[1]:
import json
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
from IPython.display import HTML
from oze.utils.env_setup import load_config
from oze.inputs import load_building
from oze.buildings.capt import CAPT
from oze.visualization import HistAnimation
[2]:
%matplotlib notebook
# matplotlib.rcParams['animation.embed_limit'] = 100
# Switch to sns visualization and deactivate automatic plotting
sns.set()
plt.ioff()
Indoor temperature¶
[3]:
animation = HistAnimation.load('animations/t_int.anim')
HTML(animation.run_animation(max_frames=10).to_jshtml())
[3]:
Private consumptions¶
[4]:
animation = HistAnimation.load('animations/private.anim')
HTML(animation.run_animation(max_frames=10).to_jshtml())
[4]:
Trainings¶
Classic - 2020 June 27¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_57M.npz'
BATCH_SIZE = 8
NUM_WORKERS = 0
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 8 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 12 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 27 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
[4]:
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (23000, 1000, 1000))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.44it/s, loss=0.0043, val_loss=0.00177]
[Epoch 2/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.48it/s, loss=0.00127, val_loss=0.0013]
[Epoch 3/30]: 100%|██████████| 23000/23000 [05:02<00:00, 76.07it/s, loss=0.000871, val_loss=0.000957]
[Epoch 4/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.47it/s, loss=0.000632, val_loss=0.000511]
[Epoch 5/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.65it/s, loss=0.000491, val_loss=0.000418]
[Epoch 6/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000394, val_loss=0.000349]
[Epoch 7/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.27it/s, loss=0.000325, val_loss=0.000378]
[Epoch 8/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.82it/s, loss=0.000285, val_loss=0.000268]
[Epoch 9/30]: 100%|██████████| 23000/23000 [05:02<00:00, 75.96it/s, loss=0.000254, val_loss=0.000223]
[Epoch 10/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.38it/s, loss=0.000222, val_loss=0.00022]
[Epoch 11/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.86it/s, loss=0.000206, val_loss=0.000187]
[Epoch 12/30]: 100%|██████████| 23000/23000 [05:02<00:00, 75.97it/s, loss=0.000191, val_loss=0.000182]
[Epoch 13/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.40it/s, loss=0.000177, val_loss=0.000174]
[Epoch 14/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000169, val_loss=0.000169]
[Epoch 15/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.42it/s, loss=0.00016, val_loss=0.00015]
[Epoch 16/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.40it/s, loss=0.000149, val_loss=0.00014]
[Epoch 17/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.46it/s, loss=0.000145, val_loss=0.000163]
[Epoch 18/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.53it/s, loss=0.000138, val_loss=0.000142]
[Epoch 19/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.54it/s, loss=0.000132, val_loss=0.000162]
[Epoch 20/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.46it/s, loss=0.000127, val_loss=0.000135]
[Epoch 21/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.59it/s, loss=0.000121, val_loss=0.000136]
[Epoch 22/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.79it/s, loss=0.000119, val_loss=0.000127]
[Epoch 23/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.73it/s, loss=0.000112, val_loss=0.000122]
[Epoch 24/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.37it/s, loss=0.000109, val_loss=0.000107]
[Epoch 25/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.67it/s, loss=0.000107, val_loss=0.000147]
[Epoch 26/30]: 100%|██████████| 23000/23000 [05:03<00:00, 75.68it/s, loss=0.000103, val_loss=0.000114]
[Epoch 27/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.60it/s, loss=0.000101, val_loss=0.000108]
[Epoch 28/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.23it/s, loss=9.82e-5, val_loss=0.000108]
[Epoch 29/30]: 100%|██████████| 23000/23000 [05:05<00:00, 75.32it/s, loss=9.44e-5, val_loss=0.000102]
[Epoch 30/30]: 100%|██████████| 23000/23000 [05:04<00:00, 75.50it/s, loss=9.13e-5, val_loss=0.000107]
model exported to models/model_2020_06_27__062220.pth with loss 0.000102

Validation¶
[7]:
_ = net.eval()
Evaluate on the test dataset¶
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:04<00:00, 26.91it/s]
Plot results on a sample¶
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)

Plot error distributions¶
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)

Plot mispredictions thresholds¶
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)

Classic - 2020 April 27¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 0
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 8 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 12 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
[4]:
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.76it/s, loss=0.00524, val_loss=0.00232]
[Epoch 2/30]: 100%|██████████| 38000/38000 [06:18<00:00, 100.50it/s, loss=0.00175, val_loss=0.00144]
[Epoch 3/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.81it/s, loss=0.00115, val_loss=0.00104]
[Epoch 4/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.03it/s, loss=0.000849, val_loss=0.000727]
[Epoch 5/30]: 100%|██████████| 38000/38000 [06:19<00:00, 100.20it/s, loss=0.000676, val_loss=0.000562]
[Epoch 6/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.45it/s, loss=0.000576, val_loss=0.000496]
[Epoch 7/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.78it/s, loss=0.000493, val_loss=0.000451]
[Epoch 8/30]: 100%|██████████| 38000/38000 [06:17<00:00, 100.78it/s, loss=0.000441, val_loss=0.000447]
[Epoch 9/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.74it/s, loss=0.000402, val_loss=0.00042]
[Epoch 10/30]: 100%|██████████| 38000/38000 [06:06<00:00, 103.58it/s, loss=0.000374, val_loss=0.000379]
[Epoch 11/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.46it/s, loss=0.000348, val_loss=0.000334]
[Epoch 12/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.12it/s, loss=0.000326, val_loss=0.000374]
[Epoch 13/30]: 100%|██████████| 38000/38000 [06:07<00:00, 103.35it/s, loss=0.000316, val_loss=0.000357]
[Epoch 14/30]: 100%|██████████| 38000/38000 [06:11<00:00, 102.17it/s, loss=0.000289, val_loss=0.000278]
[Epoch 15/30]: 100%|██████████| 38000/38000 [06:12<00:00, 102.08it/s, loss=0.000283, val_loss=0.000285]
[Epoch 16/30]: 100%|██████████| 38000/38000 [06:05<00:00, 103.89it/s, loss=0.000264, val_loss=0.000276]
[Epoch 17/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.58it/s, loss=0.000254, val_loss=0.000353]
[Epoch 18/30]: 100%|██████████| 38000/38000 [06:12<00:00, 101.92it/s, loss=0.000248, val_loss=0.000291]
[Epoch 19/30]: 100%|██████████| 38000/38000 [06:05<00:00, 104.04it/s, loss=0.000236, val_loss=0.00027]
[Epoch 20/30]: 100%|██████████| 38000/38000 [06:07<00:00, 103.36it/s, loss=0.000228, val_loss=0.00029]
[Epoch 21/30]: 100%|██████████| 38000/38000 [06:13<00:00, 101.73it/s, loss=0.000219, val_loss=0.000224]
[Epoch 22/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.51it/s, loss=0.000222, val_loss=0.00023]
[Epoch 23/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.71it/s, loss=0.000214, val_loss=0.000239]
[Epoch 24/30]: 100%|██████████| 38000/38000 [06:08<00:00, 103.13it/s, loss=0.000206, val_loss=0.000208]
[Epoch 25/30]: 100%|██████████| 38000/38000 [06:15<00:00, 101.30it/s, loss=0.000202, val_loss=0.00021]
[Epoch 26/30]: 100%|██████████| 38000/38000 [06:10<00:00, 102.61it/s, loss=0.000194, val_loss=0.000199]
[Epoch 27/30]: 100%|██████████| 38000/38000 [06:05<00:00, 104.08it/s, loss=0.000192, val_loss=0.000218]
[Epoch 28/30]: 100%|██████████| 38000/38000 [06:14<00:00, 101.51it/s, loss=0.000188, val_loss=0.000238]
[Epoch 29/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.79it/s, loss=0.000181, val_loss=0.000182]
[Epoch 30/30]: 100%|██████████| 38000/38000 [06:09<00:00, 102.80it/s, loss=0.000176, val_loss=0.000192]
model exported to models/model_2020_04_26__162559.pth with loss 0.000182

Validation¶
[7]:
_ = net.eval()
Evaluate on the test dataset¶
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 21.38it/s]
Plot results on a sample¶
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)

Plot error distributions¶
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)

Plot mispredictions thresholds¶
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)

Benchmark ConvGru - 2020 April 14¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst.loss import OZELoss
from src.benchmark import BiGRU, ConvGru
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
N = 2 # Number of layers
dropout = 0.2 # Dropout rate
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = ConvGru(d_input, d_model, d_output, N, dropout=dropout, bidirectional=True).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.00635, val_loss=0.00301]
[Epoch 2/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.56it/s, loss=0.00241, val_loss=0.0019]
[Epoch 3/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.54it/s, loss=0.00177, val_loss=0.0015]
[Epoch 4/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.51it/s, loss=0.00147, val_loss=0.00152]
[Epoch 5/30]: 100%|██████████| 38000/38000 [07:39<00:00, 82.63it/s, loss=0.00126, val_loss=0.00126]
[Epoch 6/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.56it/s, loss=0.00111, val_loss=0.00103]
[Epoch 7/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.000981, val_loss=0.00103]
[Epoch 8/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.57it/s, loss=0.000876, val_loss=0.000755]
[Epoch 9/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.49it/s, loss=0.000778, val_loss=0.000698]
[Epoch 10/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.58it/s, loss=0.000688, val_loss=0.000631]
[Epoch 11/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.55it/s, loss=0.00062, val_loss=0.000549]
[Epoch 12/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.43it/s, loss=0.000561, val_loss=0.000497]
[Epoch 13/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.34it/s, loss=0.000514, val_loss=0.000461]
[Epoch 14/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.50it/s, loss=0.000478, val_loss=0.000513]
[Epoch 15/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.49it/s, loss=0.000447, val_loss=0.000399]
[Epoch 16/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.48it/s, loss=0.000424, val_loss=0.000407]
[Epoch 17/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.30it/s, loss=0.000401, val_loss=0.000382]
[Epoch 18/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.53it/s, loss=0.000381, val_loss=0.000346]
[Epoch 19/30]: 100%|██████████| 38000/38000 [07:41<00:00, 82.38it/s, loss=0.000365, val_loss=0.00035]
[Epoch 20/30]: 100%|██████████| 38000/38000 [07:40<00:00, 82.47it/s, loss=0.000351, val_loss=0.000329]
[Epoch 21/30]: 100%|██████████| 38000/38000 [06:04<00:00, 104.30it/s, loss=0.000335, val_loss=0.000313]
[Epoch 22/30]: 100%|██████████| 38000/38000 [03:08<00:00, 201.75it/s, loss=0.000323, val_loss=0.000329]
[Epoch 23/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.14it/s, loss=0.000313, val_loss=0.000291]
[Epoch 24/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.21it/s, loss=0.0003, val_loss=0.000302]
[Epoch 25/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.71it/s, loss=0.000294, val_loss=0.000298]
[Epoch 26/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.67it/s, loss=0.000284, val_loss=0.000279]
[Epoch 27/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.40it/s, loss=0.000276, val_loss=0.000265]
[Epoch 28/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.67it/s, loss=0.000272, val_loss=0.000265]
[Epoch 29/30]: 100%|██████████| 38000/38000 [03:07<00:00, 203.04it/s, loss=0.000265, val_loss=0.000248]
[Epoch 30/30]: 100%|██████████| 38000/38000 [03:07<00:00, 202.93it/s, loss=0.000258, val_loss=0.000281]
model exported to models/model_LSTM_2020_04_14__101819.pth with loss 0.000248

Validation¶
[7]:
_ = net.eval()
Evaluate on the test dataset¶
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:01<00:00, 82.73it/s]
Plot results on a sample¶
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)

Plot error distributions¶
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)

Plot mispredictions thresholds¶
[ ]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)
Benchmark BiGRU - 2020 April 01¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst.loss import OZELoss
from src.benchmark import BiGRU
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
N = 4 # Number of layers
dropout = 0.2 # Dropout rate
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = BiGRU(d_input, d_model, d_output, N, dropout=dropout, bidirectional=True).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.00789, val_loss=0.00377]
[Epoch 2/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.90it/s, loss=0.00307, val_loss=0.0025]
[Epoch 3/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.00227, val_loss=0.00198]
[Epoch 4/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.78it/s, loss=0.00183, val_loss=0.00167]
[Epoch 5/30]: 100%|██████████| 38000/38000 [17:03<00:00, 37.12it/s, loss=0.00152, val_loss=0.00132]
[Epoch 6/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.19it/s, loss=0.00126, val_loss=0.00114]
[Epoch 7/30]: 100%|██████████| 38000/38000 [17:06<00:00, 37.00it/s, loss=0.00108, val_loss=0.000976]
[Epoch 8/30]: 100%|██████████| 38000/38000 [17:18<00:00, 36.58it/s, loss=0.000932, val_loss=0.00087]
[Epoch 9/30]: 100%|██████████| 38000/38000 [17:16<00:00, 36.65it/s, loss=0.000825, val_loss=0.000795]
[Epoch 10/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.000739, val_loss=0.000694]
[Epoch 11/30]: 100%|██████████| 38000/38000 [17:12<00:00, 36.80it/s, loss=0.00067, val_loss=0.000609]
[Epoch 12/30]: 100%|██████████| 38000/38000 [17:24<00:00, 36.39it/s, loss=0.000616, val_loss=0.000569]
[Epoch 13/30]: 100%|██████████| 38000/38000 [17:16<00:00, 36.67it/s, loss=0.000572, val_loss=0.000543]
[Epoch 14/30]: 100%|██████████| 38000/38000 [17:10<00:00, 36.89it/s, loss=0.000534, val_loss=0.000515]
[Epoch 15/30]: 100%|██████████| 38000/38000 [17:12<00:00, 36.81it/s, loss=0.000503, val_loss=0.00049]
[Epoch 16/30]: 100%|██████████| 38000/38000 [17:15<00:00, 36.71it/s, loss=0.000474, val_loss=0.000442]
[Epoch 17/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.77it/s, loss=0.000451, val_loss=0.000419]
[Epoch 18/30]: 100%|██████████| 38000/38000 [17:06<00:00, 37.03it/s, loss=0.000428, val_loss=0.00041]
[Epoch 19/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.93it/s, loss=0.000408, val_loss=0.0004]
[Epoch 20/30]: 100%|██████████| 38000/38000 [17:09<00:00, 36.90it/s, loss=0.00039, val_loss=0.00042]
[Epoch 21/30]: 100%|██████████| 38000/38000 [17:08<00:00, 36.93it/s, loss=0.000375, val_loss=0.000351]
[Epoch 22/30]: 100%|██████████| 38000/38000 [17:10<00:00, 36.86it/s, loss=0.000361, val_loss=0.000343]
[Epoch 23/30]: 100%|██████████| 38000/38000 [17:15<00:00, 36.71it/s, loss=0.000348, val_loss=0.000341]
[Epoch 24/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.000337, val_loss=0.000338]
[Epoch 25/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.19it/s, loss=0.000329, val_loss=0.000318]
[Epoch 26/30]: 100%|██████████| 38000/38000 [17:11<00:00, 36.84it/s, loss=0.000317, val_loss=0.000333]
[Epoch 27/30]: 100%|██████████| 38000/38000 [17:13<00:00, 36.78it/s, loss=0.000308, val_loss=0.00029]
[Epoch 28/30]: 100%|██████████| 38000/38000 [17:01<00:00, 37.22it/s, loss=0.000303, val_loss=0.00028]
[Epoch 29/30]: 100%|██████████| 38000/38000 [17:07<00:00, 37.00it/s, loss=0.000291, val_loss=0.000292]
[Epoch 30/30]: 100%|██████████| 38000/38000 [17:14<00:00, 36.73it/s, loss=0.000283, val_loss=0.000268]
model exported to models/model_LSTM_2020_04_01__102333.pth with loss 0.000268

Validation¶
[7]:
_ = net.eval()
Evaluate on the test dataset¶
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 20.85it/s]
Plot results on a sample¶
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)

Plot error distributions¶
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)

Plot mispredictions thresholds¶
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)

Benchmark LSTM - 2020 March 31¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst.loss import OZELoss
from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
N = 4 # Number of layers
dropout = 0.2 # Dropout rate
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
[4]:
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.30it/s, loss=0.0153, val_loss=0.00872]
[Epoch 2/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.49it/s, loss=0.00701, val_loss=0.00584]
[Epoch 3/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00527, val_loss=0.00495]
[Epoch 4/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00461, val_loss=0.00438]
[Epoch 5/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00417, val_loss=0.00407]
[Epoch 6/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00387, val_loss=0.00379]
[Epoch 7/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00363, val_loss=0.00355]
[Epoch 8/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00343, val_loss=0.00344]
[Epoch 9/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.26it/s, loss=0.00326, val_loss=0.00322]
[Epoch 10/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00313, val_loss=0.00312]
[Epoch 11/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.47it/s, loss=0.00302, val_loss=0.00299]
[Epoch 12/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.31it/s, loss=0.00292, val_loss=0.00289]
[Epoch 13/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.41it/s, loss=0.00283, val_loss=0.00282]
[Epoch 14/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.52it/s, loss=0.00275, val_loss=0.00273]
[Epoch 15/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.27it/s, loss=0.00267, val_loss=0.00268]
[Epoch 16/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.42it/s, loss=0.00259, val_loss=0.00259]
[Epoch 17/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00252, val_loss=0.0025]
[Epoch 18/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.29it/s, loss=0.00245, val_loss=0.0025]
[Epoch 19/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.39it/s, loss=0.00239, val_loss=0.00239]
[Epoch 20/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.55it/s, loss=0.00233, val_loss=0.00232]
[Epoch 21/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00226, val_loss=0.00232]
[Epoch 22/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.46it/s, loss=0.00222, val_loss=0.00225]
[Epoch 23/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.51it/s, loss=0.00218, val_loss=0.00218]
[Epoch 24/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00215, val_loss=0.00216]
[Epoch 25/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.48it/s, loss=0.00213, val_loss=0.00212]
[Epoch 26/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.46it/s, loss=0.0021, val_loss=0.00212]
[Epoch 27/30]: 100%|██████████| 38000/38000 [09:33<00:00, 66.30it/s, loss=0.00207, val_loss=0.00209]
[Epoch 28/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.40it/s, loss=0.00205, val_loss=0.00208]
[Epoch 29/30]: 100%|██████████| 38000/38000 [09:31<00:00, 66.49it/s, loss=0.00203, val_loss=0.00206]
[Epoch 30/30]: 100%|██████████| 38000/38000 [09:32<00:00, 66.33it/s, loss=0.00201, val_loss=0.00201]
model exported to models/model_LSTM_2020_03_31__112637.pth with loss 0.002010

Validation¶
[7]:
_ = net.eval()
Evaluate on the test dataset¶
[8]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:03<00:00, 35.96it/s]
Plot results on a sample¶
[9]:
map_plot_function(ozeDataset, predictions, plot_visual_sample, dataset_indices=dataloader_test.dataset.indices)

Plot error distributions¶
[10]:
map_plot_function(ozeDataset, predictions, plot_error_distribution, dataset_indices=dataloader_test.dataset.indices, time_limit=24)

Plot mispredictions thresholds¶
[11]:
map_plot_function(ozeDataset, predictions, plot_errors_threshold, plot_kwargs={'error_band': 0.1}, dataset_indices=dataloader_test.dataset.indices)

Classic - 2020 March 12¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPT_v7.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cpu
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 1000, 1000))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [26:53<00:00, 23.55it/s, loss=0.00554, val_loss=0.0033]
[Epoch 2/30]: 100%|██████████| 38000/38000 [26:33<00:00, 23.85it/s, loss=0.00247, val_loss=0.00185]
[Epoch 3/30]: 100%|██████████| 38000/38000 [26:59<00:00, 23.46it/s, loss=0.00169, val_loss=0.00148]
[Epoch 4/30]: 100%|██████████| 38000/38000 [26:54<00:00, 23.54it/s, loss=0.00129, val_loss=0.00117]
[Epoch 5/30]: 100%|██████████| 38000/38000 [26:57<00:00, 23.49it/s, loss=0.00108, val_loss=0.001]
[Epoch 6/30]: 100%|██████████| 38000/38000 [26:59<00:00, 23.47it/s, loss=0.000946, val_loss=0.000952]
[Epoch 7/30]: 100%|██████████| 38000/38000 [26:57<00:00, 23.49it/s, loss=0.000834, val_loss=0.000791]
[Epoch 8/30]: 100%|██████████| 38000/38000 [26:49<00:00, 23.61it/s, loss=0.000753, val_loss=0.000714]
[Epoch 9/30]: 100%|██████████| 38000/38000 [27:00<00:00, 23.45it/s, loss=0.000683, val_loss=0.00065]
[Epoch 10/30]: 100%|██████████| 38000/38000 [26:54<00:00, 23.54it/s, loss=0.000637, val_loss=0.000634]
[Epoch 11/30]: 100%|██████████| 38000/38000 [26:58<00:00, 23.48it/s, loss=0.000591, val_loss=0.000569]
[Epoch 12/30]: 100%|██████████| 38000/38000 [27:00<00:00, 23.45it/s, loss=0.000549, val_loss=0.000596]
[Epoch 13/30]: 100%|██████████| 38000/38000 [27:09<00:00, 23.32it/s, loss=0.000524, val_loss=0.000506]
[Epoch 14/30]: 100%|██████████| 38000/38000 [26:53<00:00, 23.55it/s, loss=0.000496, val_loss=0.00048]
[Epoch 15/30]: 100%|██████████| 38000/38000 [27:06<00:00, 23.37it/s, loss=0.00047, val_loss=0.000466]
[Epoch 16/30]: 100%|██████████| 38000/38000 [27:09<00:00, 23.32it/s, loss=0.000448, val_loss=0.000412]
[Epoch 17/30]: 100%|██████████| 38000/38000 [27:13<00:00, 23.26it/s, loss=0.000436, val_loss=0.000442]
[Epoch 18/30]: 100%|██████████| 38000/38000 [27:04<00:00, 23.40it/s, loss=0.000412, val_loss=0.000424]
[Epoch 19/30]: 100%|██████████| 38000/38000 [27:10<00:00, 23.31it/s, loss=0.000397, val_loss=0.000468]
[Epoch 20/30]: 100%|██████████| 38000/38000 [27:15<00:00, 23.24it/s, loss=0.000381, val_loss=0.000396]
[Epoch 21/30]: 100%|██████████| 38000/38000 [27:16<00:00, 23.22it/s, loss=0.000372, val_loss=0.000375]
[Epoch 22/30]: 100%|██████████| 38000/38000 [27:16<00:00, 23.23it/s, loss=0.000361, val_loss=0.000355]
[Epoch 23/30]: 100%|██████████| 38000/38000 [27:08<00:00, 23.34it/s, loss=0.000346, val_loss=0.000331]
[Epoch 24/30]: 100%|██████████| 38000/38000 [27:12<00:00, 23.27it/s, loss=0.000334, val_loss=0.000352]
[Epoch 25/30]: 100%|██████████| 38000/38000 [27:14<00:00, 23.24it/s, loss=0.000324, val_loss=0.000401]
[Epoch 26/30]: 100%|██████████| 38000/38000 [27:18<00:00, 23.19it/s, loss=0.000324, val_loss=0.000319]
[Epoch 27/30]: 100%|██████████| 38000/38000 [27:19<00:00, 23.18it/s, loss=0.000305, val_loss=0.000319]
[Epoch 28/30]: 100%|██████████| 38000/38000 [27:12<00:00, 23.28it/s, loss=0.000303, val_loss=0.000318]
[Epoch 29/30]: 100%|██████████| 38000/38000 [27:19<00:00, 23.18it/s, loss=0.000295, val_loss=0.000297]
[Epoch 30/30]: 100%|██████████| 38000/38000 [27:15<00:00, 23.23it/s, loss=0.000287, val_loss=0.000286]
model exported to models/model_2020_03_10__231146.pth with loss 0.000286

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:17<00:00, 7.00it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
if label.startswith('Q_'):
# Convert kJ/h to kW
y_true /= 3600
y_pred /= 3600
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Benchmark - 2020 March 05¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst.loss import OZELoss
from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[9]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 3e-5
EPOCHS = 30
# Model parameters
d_model = 128 # Lattent dim
N = 8*2 # Number of layers
dropout = 0.2 # Dropout rate
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))
[4]:
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[10]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[11]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.94it/s, loss=0.0318, val_loss=0.0238]
[Epoch 2/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.30it/s, loss=0.0234, val_loss=0.0234]
[Epoch 3/30]: 100%|██████████| 38000/38000 [07:01<00:00, 90.12it/s, loss=0.0189, val_loss=0.0142]
[Epoch 4/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.69it/s, loss=0.0128, val_loss=0.0122]
[Epoch 5/30]: 100%|██████████| 38000/38000 [06:59<00:00, 90.58it/s, loss=0.012, val_loss=0.0119]
[Epoch 6/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.38it/s, loss=0.0118, val_loss=0.0117]
[Epoch 7/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.27it/s, loss=0.0116, val_loss=0.0117]
[Epoch 8/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.89it/s, loss=0.0115, val_loss=0.0115]
[Epoch 9/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.74it/s, loss=0.0114, val_loss=0.0115]
[Epoch 10/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.32it/s, loss=0.0114, val_loss=0.0114]
[Epoch 11/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.45it/s, loss=0.0112, val_loss=0.0112]
[Epoch 12/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.93it/s, loss=0.0111, val_loss=0.011]
[Epoch 13/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.79it/s, loss=0.0109, val_loss=0.0109]
[Epoch 14/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.44it/s, loss=0.0108, val_loss=0.0108]
[Epoch 15/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.44it/s, loss=0.0107, val_loss=0.0107]
[Epoch 16/30]: 100%|██████████| 38000/38000 [06:57<00:00, 90.93it/s, loss=0.0107, val_loss=0.0107]
[Epoch 17/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.80it/s, loss=0.0106, val_loss=0.0106]
[Epoch 18/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.40it/s, loss=0.0106, val_loss=0.0107]
[Epoch 19/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.29it/s, loss=0.0105, val_loss=0.0105]
[Epoch 20/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0104, val_loss=0.0105]
[Epoch 21/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0104, val_loss=0.0105]
[Epoch 22/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.45it/s, loss=0.0103, val_loss=0.0104]
[Epoch 23/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.32it/s, loss=0.0103, val_loss=0.0103]
[Epoch 24/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.86it/s, loss=0.0103, val_loss=0.0104]
[Epoch 25/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.72it/s, loss=0.0102, val_loss=0.0103]
[Epoch 26/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.46it/s, loss=0.0102, val_loss=0.0103]
[Epoch 27/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.41it/s, loss=0.0101, val_loss=0.0103]
[Epoch 28/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.82it/s, loss=0.0101, val_loss=0.0102]
[Epoch 29/30]: 100%|██████████| 38000/38000 [06:58<00:00, 90.90it/s, loss=0.0101, val_loss=0.0101]
[Epoch 30/30]: 100%|██████████| 38000/38000 [07:00<00:00, 90.42it/s, loss=0.01, val_loss=0.0101]
model exported to models/model_LSTM_2020_03_04__211137.pth with loss 0.010125

Validation¶
[ ]:
_ = net.eval()
Plot results on a sample¶
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Evaluate on the test dataset¶
[ ]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
[ ]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
if label.startswith('Q_'):
# Convert kJ/h to kW
y_true /= 3600
y_pred /= 3600
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')
Benchmark - 2020 March 04¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst.loss import OZELoss
from src.benchmark import LSTM
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 64 # Lattent dim
N = 4*2 # Number of layers
dropout = 0.2 # Dropout rate
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))
[4]:
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[5]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N, dropout=dropout).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[6]:
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.83it/s, loss=0.0172, val_loss=0.0117]
[Epoch 2/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.56it/s, loss=0.0113, val_loss=0.0109]
[Epoch 3/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.82it/s, loss=0.0107, val_loss=0.0104]
[Epoch 4/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.88it/s, loss=0.00956, val_loss=0.00917]
[Epoch 5/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.62it/s, loss=0.00899, val_loss=0.0089]
[Epoch 6/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.51it/s, loss=0.00865, val_loss=0.00852]
[Epoch 7/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.10it/s, loss=0.00832, val_loss=0.00827]
[Epoch 8/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.42it/s, loss=0.00814, val_loss=0.00813]
[Epoch 9/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.85it/s, loss=0.00803, val_loss=0.00802]
[Epoch 10/30]: 100%|██████████| 38000/38000 [02:39<00:00, 238.05it/s, loss=0.00794, val_loss=0.00799]
[Epoch 11/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.58it/s, loss=0.00786, val_loss=0.00788]
[Epoch 12/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.32it/s, loss=0.00777, val_loss=0.00773]
[Epoch 13/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.70it/s, loss=0.00767, val_loss=0.00756]
[Epoch 14/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.53it/s, loss=0.00725, val_loss=0.00716]
[Epoch 15/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.63it/s, loss=0.00702, val_loss=0.00692]
[Epoch 16/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.44it/s, loss=0.00691, val_loss=0.00685]
[Epoch 17/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.61it/s, loss=0.00683, val_loss=0.00676]
[Epoch 18/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.63it/s, loss=0.00676, val_loss=0.00676]
[Epoch 19/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.59it/s, loss=0.00667, val_loss=0.0066]
[Epoch 20/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.19it/s, loss=0.00648, val_loss=0.00626]
[Epoch 21/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.11it/s, loss=0.00622, val_loss=0.00612]
[Epoch 22/30]: 100%|██████████| 38000/38000 [02:39<00:00, 237.66it/s, loss=0.00611, val_loss=0.00605]
[Epoch 23/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.66it/s, loss=0.00604, val_loss=0.00596]
[Epoch 24/30]: 100%|██████████| 38000/38000 [02:41<00:00, 235.89it/s, loss=0.00598, val_loss=0.00597]
[Epoch 25/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.88it/s, loss=0.00593, val_loss=0.00589]
[Epoch 26/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.91it/s, loss=0.0059, val_loss=0.00578]
[Epoch 27/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.10it/s, loss=0.00586, val_loss=0.00576]
[Epoch 28/30]: 100%|██████████| 38000/38000 [02:41<00:00, 235.81it/s, loss=0.00582, val_loss=0.00574]
[Epoch 29/30]: 100%|██████████| 38000/38000 [02:40<00:00, 237.23it/s, loss=0.00579, val_loss=0.0058]
[Epoch 30/30]: 100%|██████████| 38000/38000 [02:40<00:00, 236.95it/s, loss=0.00576, val_loss=0.00573]
model exported to models/model_LSTM_2020_03_04__190333.pth with loss 0.005726

Validation¶
[7]:
_ = net.eval()
Plot results on a sample¶
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 63/63 [00:00<00:00, 127.94it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
if label.startswith('Q_'):
# Convert kJ/h to kW
y_true /= 3600
y_pred /= 3600
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Classic - 2020 February 25¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_v6_full.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 50
# Model parameters
d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 38 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (38000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.49it/s, loss=0.00563, val_loss=0.00277]
[Epoch 2/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.40it/s, loss=0.00223, val_loss=0.00155]
[Epoch 3/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.50it/s, loss=0.00149, val_loss=0.00123]
[Epoch 4/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.00113, val_loss=0.000995]
[Epoch 5/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.50it/s, loss=0.000901, val_loss=0.00084]
[Epoch 6/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000759, val_loss=0.000615]
[Epoch 7/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.00065, val_loss=0.000555]
[Epoch 8/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000573, val_loss=0.000527]
[Epoch 9/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.53it/s, loss=0.000514, val_loss=0.000619]
[Epoch 10/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000473, val_loss=0.000503]
[Epoch 11/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.53it/s, loss=0.000445, val_loss=0.000407]
[Epoch 12/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000402, val_loss=0.000384]
[Epoch 13/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000388, val_loss=0.000408]
[Epoch 14/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000371, val_loss=0.000333]
[Epoch 15/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000344, val_loss=0.000333]
[Epoch 16/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000331, val_loss=0.000407]
[Epoch 17/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000309, val_loss=0.000326]
[Epoch 18/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000304, val_loss=0.000302]
[Epoch 19/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.00029, val_loss=0.000312]
[Epoch 20/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000287, val_loss=0.000266]
[Epoch 21/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000269, val_loss=0.00029]
[Epoch 22/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000265, val_loss=0.000237]
[Epoch 23/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000255, val_loss=0.000237]
[Epoch 24/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000255, val_loss=0.00024]
[Epoch 25/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000244, val_loss=0.000225]
[Epoch 26/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000239, val_loss=0.000231]
[Epoch 27/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000229, val_loss=0.000241]
[Epoch 28/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000226, val_loss=0.000245]
[Epoch 29/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.52it/s, loss=0.000221, val_loss=0.000221]
[Epoch 30/50]: 100%|██████████| 38000/38000 [14:34<00:00, 43.43it/s, loss=0.000226, val_loss=0.000208]
[Epoch 31/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000209, val_loss=0.000219]
[Epoch 32/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000223, val_loss=0.000222]
[Epoch 33/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000217, val_loss=0.000224]
[Epoch 34/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000202, val_loss=0.000199]
[Epoch 35/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000194, val_loss=0.000191]
[Epoch 36/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000198, val_loss=0.000185]
[Epoch 37/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.51it/s, loss=0.000189, val_loss=0.000211]
[Epoch 38/50]: 100%|██████████| 38000/38000 [14:36<00:00, 43.35it/s, loss=0.000195, val_loss=0.00018]
[Epoch 39/50]: 100%|██████████| 38000/38000 [14:33<00:00, 43.51it/s, loss=0.000183, val_loss=0.00029]
[Epoch 40/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000183, val_loss=0.000161]
[Epoch 41/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.55it/s, loss=0.000181, val_loss=0.000168]
[Epoch 42/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000178, val_loss=0.000179]
[Epoch 43/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000174, val_loss=0.000174]
[Epoch 44/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.000181, val_loss=0.000155]
[Epoch 45/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.000168, val_loss=0.000191]
[Epoch 46/50]: 100%|██████████| 38000/38000 [14:34<00:00, 43.43it/s, loss=0.000165, val_loss=0.000185]
[Epoch 47/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.53it/s, loss=0.00017, val_loss=0.000159]
[Epoch 48/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.42it/s, loss=0.00017, val_loss=0.000159]
[Epoch 49/50]: 100%|██████████| 38000/38000 [14:32<00:00, 43.54it/s, loss=0.000165, val_loss=0.000173]
[Epoch 50/50]: 100%|██████████| 38000/38000 [14:35<00:00, 43.41it/s, loss=0.000161, val_loss=0.000166]
model exported to models/model_2020_02_25__102558.pth with loss 0.000155

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 168, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 63/63 [00:05<00:00, 12.26it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
if label.startswith('Q_'):
# Convert kJ/h to kW
y_true /= 3600
y_pred /= 3600
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Window - 2020 January 31¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 50
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/50]: 100%|██████████| 12000/12000 [07:40<00:00, 26.04it/s, loss=0.00906, val_loss=0.00509]
[Epoch 2/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.67it/s, loss=0.00405, val_loss=0.00363]
[Epoch 3/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.63it/s, loss=0.00286, val_loss=0.00255]
[Epoch 4/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.63it/s, loss=0.00224, val_loss=0.00206]
[Epoch 5/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.67it/s, loss=0.00182, val_loss=0.00161]
[Epoch 6/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.00157, val_loss=0.00143]
[Epoch 7/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.00138, val_loss=0.00129]
[Epoch 8/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.00122, val_loss=0.00114]
[Epoch 9/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.00108, val_loss=0.00108]
[Epoch 10/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000974, val_loss=0.000869]
[Epoch 11/50]: 100%|██████████| 12000/12000 [07:31<00:00, 26.56it/s, loss=0.000885, val_loss=0.00078]
[Epoch 12/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000818, val_loss=0.000762]
[Epoch 13/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000743, val_loss=0.000992]
[Epoch 14/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000692, val_loss=0.000598]
[Epoch 15/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000645, val_loss=0.000682]
[Epoch 16/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000611, val_loss=0.000609]
[Epoch 17/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.00057, val_loss=0.0005]
[Epoch 18/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000542, val_loss=0.000509]
[Epoch 19/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000501, val_loss=0.000477]
[Epoch 20/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000507, val_loss=0.000479]
[Epoch 21/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.64it/s, loss=0.000465, val_loss=0.000489]
[Epoch 22/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.65it/s, loss=0.000449, val_loss=0.000459]
[Epoch 23/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000427, val_loss=0.00046]
[Epoch 24/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000417, val_loss=0.000403]
[Epoch 25/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000402, val_loss=0.000474]
[Epoch 26/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000387, val_loss=0.00034]
[Epoch 27/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000385, val_loss=0.00041]
[Epoch 28/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000374, val_loss=0.000387]
[Epoch 29/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000351, val_loss=0.000342]
[Epoch 30/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.69it/s, loss=0.000352, val_loss=0.000397]
[Epoch 31/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000337, val_loss=0.000324]
[Epoch 32/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000337, val_loss=0.00031]
[Epoch 33/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.73it/s, loss=0.000328, val_loss=0.000298]
[Epoch 34/50]: 100%|██████████| 12000/12000 [07:30<00:00, 26.66it/s, loss=0.000315, val_loss=0.000318]
[Epoch 35/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000307, val_loss=0.000306]
[Epoch 36/50]: 100%|██████████| 12000/12000 [07:31<00:00, 26.56it/s, loss=0.000307, val_loss=0.0003]
[Epoch 37/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000294, val_loss=0.00032]
[Epoch 38/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.68it/s, loss=0.000295, val_loss=0.000368]
[Epoch 39/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000282, val_loss=0.000274]
[Epoch 40/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.00028, val_loss=0.000255]
[Epoch 41/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000275, val_loss=0.000262]
[Epoch 42/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000264, val_loss=0.000247]
[Epoch 43/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.67it/s, loss=0.00027, val_loss=0.000292]
[Epoch 44/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.71it/s, loss=0.000261, val_loss=0.00025]
[Epoch 45/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.000253, val_loss=0.000283]
[Epoch 46/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.72it/s, loss=0.000259, val_loss=0.000245]
[Epoch 47/50]: 100%|██████████| 12000/12000 [07:29<00:00, 26.70it/s, loss=0.00025, val_loss=0.000245]
[Epoch 48/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000248, val_loss=0.00025]
[Epoch 49/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.73it/s, loss=0.000243, val_loss=0.000258]
[Epoch 50/50]: 100%|██████████| 12000/12000 [07:28<00:00, 26.74it/s, loss=0.000238, val_loss=0.000219]
model exported to models/model_2020_01_31__082906.pth with loss 0.000219

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.93it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Window - 2020 January 10¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.48it/s, loss=0.00826, val_loss=0.00478]
[Epoch 2/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00403, val_loss=0.0032]
[Epoch 3/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.36it/s, loss=0.00273, val_loss=0.00225]
[Epoch 4/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00217, val_loss=0.00182]
[Epoch 5/30]: 100%|██████████| 12000/12000 [06:49<00:00, 29.30it/s, loss=0.0018, val_loss=0.00155]
[Epoch 6/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.44it/s, loss=0.00152, val_loss=0.00134]
[Epoch 7/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00132, val_loss=0.00114]
[Epoch 8/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.00118, val_loss=0.00106]
[Epoch 9/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.40it/s, loss=0.00103, val_loss=0.000951]
[Epoch 10/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.57it/s, loss=0.000919, val_loss=0.00132]
[Epoch 11/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.000829, val_loss=0.000809]
[Epoch 12/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.50it/s, loss=0.000756, val_loss=0.000734]
[Epoch 13/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.57it/s, loss=0.000701, val_loss=0.000649]
[Epoch 14/30]: 100%|██████████| 12000/12000 [06:48<00:00, 29.40it/s, loss=0.000651, val_loss=0.000719]
[Epoch 15/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.52it/s, loss=0.000608, val_loss=0.000567]
[Epoch 16/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.53it/s, loss=0.000569, val_loss=0.000607]
[Epoch 17/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.48it/s, loss=0.000538, val_loss=0.000533]
[Epoch 18/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000519, val_loss=0.000519]
[Epoch 19/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.59it/s, loss=0.000497, val_loss=0.000472]
[Epoch 20/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000468, val_loss=0.000667]
[Epoch 21/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.56it/s, loss=0.000458, val_loss=0.000544]
[Epoch 22/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.58it/s, loss=0.000427, val_loss=0.00039]
[Epoch 23/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.56it/s, loss=0.00042, val_loss=0.000406]
[Epoch 24/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.48it/s, loss=0.000401, val_loss=0.000395]
[Epoch 25/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.54it/s, loss=0.000392, val_loss=0.000384]
[Epoch 26/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.61it/s, loss=0.000377, val_loss=0.000438]
[Epoch 27/30]: 100%|██████████| 12000/12000 [06:44<00:00, 29.64it/s, loss=0.00036, val_loss=0.000381]
[Epoch 28/30]: 100%|██████████| 12000/12000 [06:45<00:00, 29.62it/s, loss=0.000358, val_loss=0.000331]
[Epoch 29/30]: 100%|██████████| 12000/12000 [06:46<00:00, 29.55it/s, loss=0.000352, val_loss=0.000318]
[Epoch 30/30]: 100%|██████████| 12000/12000 [06:47<00:00, 29.45it/s, loss=0.000335, val_loss=0.000324]
model exported to models/model_2020_01_10__082029.pth with loss 0.000318

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.13it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Classic - 2020 January 07¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.loss import OZELoss
from src.dataset import OzeDataset
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 30
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.00923, val_loss=0.00494]
[Epoch 2/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.00479, val_loss=0.00407]
[Epoch 3/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.97it/s, loss=0.00405, val_loss=0.00366]
[Epoch 4/30]: 100%|██████████| 12000/12000 [10:30<00:00, 19.04it/s, loss=0.00344, val_loss=0.00312]
[Epoch 5/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.20it/s, loss=0.003, val_loss=0.00267]
[Epoch 6/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.00259, val_loss=0.00262]
[Epoch 7/30]: 100%|██████████| 12000/12000 [10:24<00:00, 19.21it/s, loss=0.00198, val_loss=0.00168]
[Epoch 8/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.13it/s, loss=0.00156, val_loss=0.00149]
[Epoch 9/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.14it/s, loss=0.00136, val_loss=0.00124]
[Epoch 10/30]: 100%|██████████| 12000/12000 [10:29<00:00, 19.08it/s, loss=0.00123, val_loss=0.00117]
[Epoch 11/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.00115, val_loss=0.00104]
[Epoch 12/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.00109, val_loss=0.000955]
[Epoch 13/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.19it/s, loss=0.00105, val_loss=0.000998]
[Epoch 14/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.16it/s, loss=0.001, val_loss=0.001]
[Epoch 15/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.11it/s, loss=0.000965, val_loss=0.000884]
[Epoch 16/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.13it/s, loss=0.000926, val_loss=0.000893]
[Epoch 17/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.09it/s, loss=0.000904, val_loss=0.000981]
[Epoch 18/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.18it/s, loss=0.000878, val_loss=0.00088]
[Epoch 19/30]: 100%|██████████| 12000/12000 [10:25<00:00, 19.17it/s, loss=0.000858, val_loss=0.000779]
[Epoch 20/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.17it/s, loss=0.000817, val_loss=0.000809]
[Epoch 21/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.000811, val_loss=0.000783]
[Epoch 22/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.14it/s, loss=0.00077, val_loss=0.000741]
[Epoch 23/30]: 100%|██████████| 12000/12000 [10:28<00:00, 19.10it/s, loss=0.000747, val_loss=0.000793]
[Epoch 24/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.12it/s, loss=0.000727, val_loss=0.000682]
[Epoch 25/30]: 100%|██████████| 12000/12000 [10:26<00:00, 19.15it/s, loss=0.000715, val_loss=0.000697]
[Epoch 26/30]: 100%|██████████| 12000/12000 [10:27<00:00, 19.12it/s, loss=0.00069, val_loss=0.000666]
[Epoch 27/30]: 100%|██████████| 12000/12000 [10:30<00:00, 19.02it/s, loss=0.000675, val_loss=0.000619]
[Epoch 28/30]: 100%|██████████| 12000/12000 [10:31<00:00, 19.01it/s, loss=0.000651, val_loss=0.000621]
[Epoch 29/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.96it/s, loss=0.00064, val_loss=0.000623]
[Epoch 30/30]: 100%|██████████| 12000/12000 [10:32<00:00, 18.98it/s, loss=0.000631, val_loss=0.000597]
model exported to models/model_2020_01_07__115048.pth with loss 0.000597

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:14<00:00, 8.47it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std')

Window - 2020 January 06¶
[1]:
import datetime
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from tst import Transformer
from tst.dataset import OzeDataset
from tst.loss import OZELoss
from tst.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v5.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 2e-4
EPOCHS = 40
# Model parameters
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, dropout=dropout, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net.state_dict(), model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")
[Epoch 1/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.0101, val_loss=0.00518]
[Epoch 2/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.00421, val_loss=0.00311]
[Epoch 3/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.00274, val_loss=0.00212]
[Epoch 4/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.00207, val_loss=0.00181]
[Epoch 5/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.58it/s, loss=0.0017, val_loss=0.00147]
[Epoch 6/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.00144, val_loss=0.00128]
[Epoch 7/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.37it/s, loss=0.0013, val_loss=0.00128]
[Epoch 8/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.31it/s, loss=0.00118, val_loss=0.00113]
[Epoch 9/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.00106, val_loss=0.00106]
[Epoch 10/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000982, val_loss=0.000899]
[Epoch 11/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.000911, val_loss=0.00081]
[Epoch 12/40]: 100%|██████████| 12000/12000 [06:35<00:00, 30.31it/s, loss=0.000847, val_loss=0.000739]
[Epoch 13/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000778, val_loss=0.000816]
[Epoch 14/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.43it/s, loss=0.000739, val_loss=0.000652]
[Epoch 15/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.53it/s, loss=0.00069, val_loss=0.000621]
[Epoch 16/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000649, val_loss=0.000565]
[Epoch 17/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000614, val_loss=0.000607]
[Epoch 18/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000575, val_loss=0.000584]
[Epoch 19/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000549, val_loss=0.000569]
[Epoch 20/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.38it/s, loss=0.000524, val_loss=0.000572]
[Epoch 21/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000492, val_loss=0.000458]
[Epoch 22/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.50it/s, loss=0.000485, val_loss=0.000549]
[Epoch 23/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000455, val_loss=0.000647]
[Epoch 24/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.000441, val_loss=0.000572]
[Epoch 25/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.000422, val_loss=0.000376]
[Epoch 26/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000408, val_loss=0.000416]
[Epoch 27/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.53it/s, loss=0.000396, val_loss=0.000454]
[Epoch 28/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.00038, val_loss=0.000424]
[Epoch 29/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000371, val_loss=0.000427]
[Epoch 30/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000369, val_loss=0.000352]
[Epoch 31/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.46it/s, loss=0.000349, val_loss=0.00034]
[Epoch 32/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.49it/s, loss=0.000344, val_loss=0.000322]
[Epoch 33/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.000342, val_loss=0.000327]
[Epoch 34/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000326, val_loss=0.00031]
[Epoch 35/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000326, val_loss=0.000317]
[Epoch 36/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000319, val_loss=0.000317]
[Epoch 37/40]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000303, val_loss=0.000341]
[Epoch 38/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.52it/s, loss=0.0003, val_loss=0.000297]
[Epoch 39/40]: 100%|██████████| 12000/12000 [06:34<00:00, 30.45it/s, loss=0.000292, val_loss=0.000265]
[Epoch 40/40]: 100%|██████████| 12000/12000 [06:33<00:00, 30.47it/s, loss=0.000288, val_loss=0.000264]
model exported to models/model_2020_01_06__144203.pth with loss 0.000264

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[7]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[9]:
predictions = np.empty(shape=(len(dataloader_test.dataset), 672, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.63it/s]
[10]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2020 January 03¶
This training was done during 20 epochs, followed by 20 additional ones, thus lower loss at the end. Slight overfit though.
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_v4.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (12000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[11]:
model_save_path = f"models/{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth"
val_loss_best = np.inf
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
if val_loss < val_loss_best:
val_loss_best = val_loss
torch.save(net, model_save_path)
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss_val[-1]):5f}")
print(f"model exported to {model_save_path}")
[Epoch 1/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.96it/s, loss=0.00117, val_loss=0.0011]
[Epoch 2/20]: 100%|██████████| 12000/12000 [06:28<00:00, 30.92it/s, loss=0.000983, val_loss=0.000908]
[Epoch 3/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.99it/s, loss=0.000867, val_loss=0.000994]
[Epoch 4/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.96it/s, loss=0.000787, val_loss=0.000831]
[Epoch 5/20]: 100%|██████████| 12000/12000 [06:28<00:00, 30.90it/s, loss=0.000741, val_loss=0.00073]
[Epoch 6/20]: 100%|██████████| 12000/12000 [06:27<00:00, 30.94it/s, loss=0.000683, val_loss=0.000808]
[Epoch 7/20]: 100%|██████████| 12000/12000 [06:29<00:00, 30.79it/s, loss=0.000654, val_loss=0.000655]
[Epoch 8/20]: 100%|██████████| 12000/12000 [06:30<00:00, 30.77it/s, loss=0.00061, val_loss=0.000624]
[Epoch 9/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.57it/s, loss=0.000576, val_loss=0.000683]
[Epoch 10/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.54it/s, loss=0.000551, val_loss=0.000581]
[Epoch 11/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000524, val_loss=0.000569]
[Epoch 12/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000494, val_loss=0.000539]
[Epoch 13/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000474, val_loss=0.000568]
[Epoch 14/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000453, val_loss=0.000567]
[Epoch 15/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.47it/s, loss=0.000439, val_loss=0.000502]
[Epoch 16/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.50it/s, loss=0.00042, val_loss=0.00046]
[Epoch 17/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.56it/s, loss=0.000404, val_loss=0.00054]
[Epoch 18/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.55it/s, loss=0.000393, val_loss=0.000458]
[Epoch 19/20]: 100%|██████████| 12000/12000 [06:33<00:00, 30.51it/s, loss=0.00038, val_loss=0.000438]
[Epoch 20/20]: 100%|██████████| 12000/12000 [06:32<00:00, 30.58it/s, loss=0.000372, val_loss=0.000487]
Loss: 0.000487
model exported to models/model_00048.pth

Validation¶
[12]:
_ = net.eval()
Plot results on a sample¶
[13]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[14]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[15]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.55it/s]
[16]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2019 December 29¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1.5e-4
EPOCHS = 20
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (9000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS,
pin_memory=False
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss[-1]):5f}")
model_path = f"models/model_{str(hist_loss[-1]).split('.')[-1][:5]}.pth"
torch.save(net, model_path)
print(f"model exported to {model_path}")
[Epoch 1/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.0139, val_loss=0.00843]
[Epoch 2/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00662, val_loss=0.00666]
[Epoch 3/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00546, val_loss=0.00491]
[Epoch 4/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00466, val_loss=0.00417]
[Epoch 5/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.004, val_loss=0.00384]
[Epoch 6/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00327, val_loss=0.00319]
[Epoch 7/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.78it/s, loss=0.00279, val_loss=0.00291]
[Epoch 8/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00241, val_loss=0.00226]
[Epoch 9/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00217, val_loss=0.00201]
[Epoch 10/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00201, val_loss=0.00207]
[Epoch 11/20]: 100%|██████████| 9000/9000 [04:53<00:00, 30.62it/s, loss=0.00185, val_loss=0.0021]
[Epoch 12/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.00176, val_loss=0.00162]
[Epoch 13/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.79it/s, loss=0.00166, val_loss=0.00161]
[Epoch 14/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00157, val_loss=0.00163]
[Epoch 15/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.82it/s, loss=0.0015, val_loss=0.00149]
[Epoch 16/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00145, val_loss=0.00139]
[Epoch 17/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.80it/s, loss=0.00139, val_loss=0.00135]
[Epoch 18/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00133, val_loss=0.00127]
[Epoch 19/20]: 100%|██████████| 9000/9000 [04:51<00:00, 30.83it/s, loss=0.00129, val_loss=0.00135]
[Epoch 20/20]: 100%|██████████| 9000/9000 [04:52<00:00, 30.81it/s, loss=0.00122, val_loss=0.00124]
Loss: 0.001224
model exported to models/model_00122.pth

Validation¶
[7]:
_ = net.eval()
Plot results on a sample¶
[8]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.21it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2019 December 28¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.loss import OZELoss
from src.utils import visual_sample, compute_loss
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
ozeDataset = OzeDataset(DATASET_PATH)
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (9000, 500, 500))
dataloader_train = DataLoader(dataset_train,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_val = DataLoader(dataset_val,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
dataloader_test = DataLoader(dataset_test,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader_train):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(y.to(device), netout)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
train_loss = running_loss/len(dataloader_train)
val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
hist_loss[idx_epoch] = train_loss
hist_loss_val[idx_epoch] = val_loss
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.48it/s, loss=0.0139, val_loss=0.00883]
[Epoch 2/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.39it/s, loss=0.00705, val_loss=0.00596]
[Epoch 3/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.41it/s, loss=0.00577, val_loss=0.005]
[Epoch 4/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.43it/s, loss=0.00506, val_loss=0.00454]
[Epoch 5/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00454, val_loss=0.00409]
[Epoch 6/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.41it/s, loss=0.00411, val_loss=0.00378]
[Epoch 7/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.39it/s, loss=0.0037, val_loss=0.00326]
[Epoch 8/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.40it/s, loss=0.00325, val_loss=0.00312]
[Epoch 9/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00293, val_loss=0.00254]
[Epoch 10/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.40it/s, loss=0.00257, val_loss=0.00245]
[Epoch 11/20]: 100%|██████████| 9000/9000 [04:56<00:00, 30.37it/s, loss=0.00239, val_loss=0.00228]
[Epoch 12/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00224, val_loss=0.00229]
[Epoch 13/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.44it/s, loss=0.00206, val_loss=0.00191]
[Epoch 14/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.41it/s, loss=0.002, val_loss=0.00203]
[Epoch 15/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00186, val_loss=0.00177]
[Epoch 16/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.46it/s, loss=0.00179, val_loss=0.00167]
[Epoch 17/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.48it/s, loss=0.00169, val_loss=0.00184]
[Epoch 18/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.45it/s, loss=0.00163, val_loss=0.00162]
[Epoch 19/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.43it/s, loss=0.00157, val_loss=0.00153]
[Epoch 20/20]: 100%|██████████| 9000/9000 [04:55<00:00, 30.47it/s, loss=0.00153, val_loss=0.00151]
Loss: 0.001529

Validation¶
[6]:
_ = net.eval()
Plot results on a sample¶
[12]:
visual_sample(dataloader_test, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[8]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[16]:
predictions = np.empty(shape=(len(dataloader_test.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(dataloader_test, total=len(dataloader_test)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 19.95it/s]
[17]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (dataloader_test.dataset.dataset._x.numpy()[..., dataloader_test.dataset.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
y_true_full = dataloader_test.dataset.dataset._y[dataloader_test.dataset.indices].numpy()
for idx_label, (label, ax) in enumerate(zip(dataloader_test.dataset.dataset.labels['X'], axes)):
# Select output to plot
y_true = y_true_full[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = dataloader_test.dataset.dataset.rescale(y_true, idx_label)
y_pred = dataloader_test.dataset.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2019 December 28¶
This is the first training on the CAPTrocadero dataset (9500 - 500 samples)
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_CAPTrocadero_train.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_CAPTrocadero_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 9500/9500 [05:05<00:00, 31.07it/s, loss=0.012]
[Epoch 2/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.50it/s, loss=0.00476]
[Epoch 3/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.00341]
[Epoch 4/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00235]
[Epoch 5/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.56it/s, loss=0.0019]
[Epoch 6/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.55it/s, loss=0.00164]
[Epoch 7/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00149]
[Epoch 8/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.54it/s, loss=0.00138]
[Epoch 9/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.57it/s, loss=0.00127]
[Epoch 10/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.47it/s, loss=0.00117]
[Epoch 11/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.46it/s, loss=0.0011]
[Epoch 12/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.46it/s, loss=0.00101]
[Epoch 13/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000917]
[Epoch 14/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.48it/s, loss=0.000852]
[Epoch 15/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.43it/s, loss=0.000806]
[Epoch 16/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.37it/s, loss=0.000765]
[Epoch 17/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000713]
[Epoch 18/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.36it/s, loss=0.000667]
[Epoch 19/20]: 100%|██████████| 9500/9500 [05:02<00:00, 31.40it/s, loss=0.000657]
[Epoch 20/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.52it/s, loss=0.000589]
Loss: 0.000589

Validation¶
Load dataset and network¶
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Plot results on a sample¶
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.06it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
# Select output to plot
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = datatestloader.dataset.rescale(y_true, idx_label)
y_pred = datatestloader.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2019 December 25¶
This is the first training with the 3rd version of the dataset, containing 9500 training samples and 500 test samples.
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_train.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 39 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 9500/9500 [05:05<00:00, 31.09it/s, loss=0.00938]
[Epoch 2/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00452]
[Epoch 3/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.0029]
[Epoch 4/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00209]
[Epoch 5/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.00172]
[Epoch 6/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.67it/s, loss=0.00151]
[Epoch 7/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.66it/s, loss=0.00133]
[Epoch 8/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.00118]
[Epoch 9/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.66it/s, loss=0.00108]
[Epoch 10/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.57it/s, loss=0.000912]
[Epoch 11/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.55it/s, loss=0.000802]
[Epoch 12/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.65it/s, loss=0.00073]
[Epoch 13/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.68it/s, loss=0.00066]
[Epoch 14/20]: 100%|██████████| 9500/9500 [05:01<00:00, 31.49it/s, loss=0.000638]
[Epoch 15/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.58it/s, loss=0.000614]
[Epoch 16/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.63it/s, loss=0.000549]
[Epoch 17/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.69it/s, loss=0.000542]
[Epoch 18/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.64it/s, loss=0.000486]
[Epoch 19/20]: 100%|██████████| 9500/9500 [04:59<00:00, 31.70it/s, loss=0.00046]
[Epoch 20/20]: 100%|██████████| 9500/9500 [05:00<00:00, 31.60it/s, loss=0.00044]
Loss: 0.000440

Validation¶
Load dataset and network¶
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample¶
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.29it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
# Select output to plot
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = datatestloader.dataset.rescale(y_true, idx_label)
y_pred = datatestloader.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Window - 2019 December 24¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 7500/7500 [04:02<00:00, 30.89it/s, loss=0.0104]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00525]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00393]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00283]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00226]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00193]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00178]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.0016]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00152]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.63it/s, loss=0.00143]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00125]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00119]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00109]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.65it/s, loss=0.00103]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.000997]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:58<00:00, 31.49it/s, loss=0.000923]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.000876]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.000875]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.62it/s, loss=0.000811]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.000808]
Loss: 0.000808

Validation¶
Load dataset and network¶
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample¶
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:06<00:00, 20.57it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
# Select output to plot
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = datatestloader.dataset.rescale(y_true, idx_label)
y_pred = datatestloader.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Chunk - 2019 December 23¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
chunk_mode = 'window'
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, chunk_mode=chunk_mode, pe=pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 7500/7500 [04:03<00:00, 30.75it/s, loss=0.011]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.64it/s, loss=0.00566]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.59it/s, loss=0.00474]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00425]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.59it/s, loss=0.00371]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.57it/s, loss=0.00327]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.00288]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.61it/s, loss=0.00262]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.67it/s, loss=0.00236]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00217]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.00199]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.55it/s, loss=0.00185]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.53it/s, loss=0.00173]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.55it/s, loss=0.00161]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:56<00:00, 31.67it/s, loss=0.00152]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:59<00:00, 31.36it/s, loss=0.00147]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.58it/s, loss=0.0014]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.60it/s, loss=0.00135]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.56it/s, loss=0.00128]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:57<00:00, 31.54it/s, loss=0.00123]
Loss: 0.001232

Validation¶
Load dataset and network¶
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample¶
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 21.10it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
# Select output to plot
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = datatestloader.dataset.rescale(y_true, idx_label)
y_pred = datatestloader.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Chunk - 2019 December 23¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-4
EPOCHS = 20
TIME_CHUNK = True
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0127]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.42it/s, loss=0.00693]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00605]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.03it/s, loss=0.00541]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.94it/s, loss=0.00508]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00466]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.32it/s, loss=0.00428]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:11<00:00, 39.22it/s, loss=0.00394]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:10<00:00, 39.43it/s, loss=0.00372]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.66it/s, loss=0.00344]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.52it/s, loss=0.00331]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0031]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.75it/s, loss=0.00293]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.80it/s, loss=0.00283]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.09it/s, loss=0.00269]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.07it/s, loss=0.0026]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.24it/s, loss=0.00246]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.13it/s, loss=0.00238]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.03it/s, loss=0.00227]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.85it/s, loss=0.0022]
Loss: 0.002201

Validation¶
Load dataset and network¶
[6]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
[7]:
# net = torch.load('models/model_00247.pth', map_location=device)
Plot results on a sample¶
[8]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[9]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[10]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.60it/s]
[11]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
occupancy = (datatestloader.dataset._x.numpy()[..., datatestloader.dataset.labels["Z"].index("occupancy")].mean(axis=0)>0.5).astype(float)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
# Select output to plot
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
# Rescale
y_true = datatestloader.dataset.rescale(y_true, idx_label)
y_pred = datatestloader.dataset.rescale(y_pred, idx_label)
# Compute delta, mean and std
delta = np.abs(y_true - y_pred)
mean = delta.mean(axis=0)
std = delta.std(axis=0)
# Plot
# Labels for consumption and temperature
if label.startswith('Q_'):
y_label_unit = 'kW'
else:
y_label_unit = '°C'
# Occupancy
occupancy_idxes = np.where(np.diff(occupancy) != 0)[0]
for idx in range(0, len(occupancy_idxes), 2):
ax.axvspan(occupancy_idxes[idx], occupancy_idxes[idx+1], facecolor='green', alpha=.15)
# Std
ax.fill_between(np.arange(mean.shape[0]), (mean - std), (mean + std), alpha=.4, label='std')
# Mean
ax.plot(mean, label='mean')
# Title and labels
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
ax.legend()
plt.savefig('error_mean_std.jpg')

Chunk - 2019 December 20¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
temperature_loss_function = nn.MSELoss()
consumption_loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
y = y.to(device)
delta_Q = consumption_loss_function(netout[..., :-1], y[..., :-1])
delta_T = temperature_loss_function(netout[..., -1], y[..., -1])
loss = torch.log(1 + delta_T) + 0.3 * torch.log(1 + delta_Q)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 7500/7500 [03:14<00:00, 38.49it/s, loss=0.0112]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.71it/s, loss=0.0059]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.68it/s, loss=0.00506]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.00453]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.60it/s, loss=0.00391]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.00361]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.0033]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.73it/s, loss=0.00316]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.00296]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.74it/s, loss=0.0028]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00265]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.66it/s, loss=0.00252]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.70it/s, loss=0.00238]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.75it/s, loss=0.00223]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.77it/s, loss=0.00214]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.64it/s, loss=0.00201]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.63it/s, loss=0.00191]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00186]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00174]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.65it/s, loss=0.00168]
Loss: 0.001683

Validation¶
Load dataset and network¶
[11]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
Plot results on a sample¶
[12]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[13]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[14]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.89it/s]
[15]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
y_true = datatestloader.dataset._y.numpy()[..., idx_label]
y_pred = predictions[..., idx_label]
y_true = dataloader.dataset.rescale(y_true, idx_label)
y_pred = dataloader.dataset.rescale(y_pred, idx_label)
delta = np.square(y_true - y_pred)
# For consumption
if label.startswith('Q_'):
y_label_unit = 'kWh'
else:
y_label_unit = '°C'
mean = delta.mean(axis=0)
std = delta.std(axis=0)
ax.fill_between(np.arange(K), (mean - 3 * std), (mean + 3 * std), alpha=.3)
ax.plot(mean)
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
plt.savefig('error_mean_std.jpg')

Chunk - 2019 December 20¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
from src.utils import visual_sample
[2]:
# Training parameters
DATASET_PATH = 'datasets/dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True
# Testing parameters
TEST_DATASET_PATH = 'datasets/dataset_test.npz'
TEST_MODEL_PATH = 'models/model_00251.pth'
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training¶
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(netout, y.to(device))
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
str_loss = str(hist_loss[-1]).split('.')[-1][:5]
torch.save(net, f"models/model_{str_loss}.pth")
[Epoch 1/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.91it/s, loss=0.0145]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.63it/s, loss=0.00864]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.94it/s, loss=0.00674]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.96it/s, loss=0.0059]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00518]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.88it/s, loss=0.00459]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.88it/s, loss=0.00422]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.85it/s, loss=0.00393]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00369]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.95it/s, loss=0.00347]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.69it/s, loss=0.00331]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00318]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.93it/s, loss=0.00302]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.72it/s, loss=0.00293]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.89it/s, loss=0.00284]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:07<00:00, 39.91it/s, loss=0.00276]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.67it/s, loss=0.00267]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.61it/s, loss=0.00262]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:08<00:00, 39.79it/s, loss=0.00259]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:09<00:00, 39.62it/s, loss=0.00251]
Loss: 0.002514

Validation¶
Load dataset and network¶
[3]:
datatestloader = DataLoader(OzeDataset(TEST_DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS
)
net = torch.load(TEST_MODEL_PATH, map_location=device)
Plot results on a sample¶
[4]:
visual_sample(datatestloader, net, device)
plt.savefig("fig.jpg")

Plot encoding attention map¶
[5]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Evaluate on the test dataset¶
[6]:
predictions = np.empty(shape=(len(datatestloader.dataset), K, 8))
idx_prediction = 0
with torch.no_grad():
for x, y in tqdm(datatestloader, total=len(datatestloader)):
netout = net(x.to(device)).cpu().numpy()
predictions[idx_prediction:idx_prediction+x.shape[0]] = netout
idx_prediction += x.shape[0]
100%|██████████| 125/125 [00:05<00:00, 24.52it/s]
[7]:
fig, axes = plt.subplots(8, 1)
fig.set_figwidth(20)
fig.set_figheight(40)
plt.subplots_adjust(bottom=0.05)
delta = np.square(predictions - datatestloader.dataset._y.numpy())
for idx_label, (label, ax) in enumerate(zip(datatestloader.dataset.labels['X'], axes)):
input_data = delta[..., idx_label]
# For consumption
if label.startswith('Q_'):
y_label_unit = 'kWh'
else:
y_label_unit = '°C'
mean = input_data.mean(axis=0)
std = input_data.std(axis=0)
ax.fill_between(np.arange(K), (mean - 3 * std), (mean + 3 * std), alpha=.3)
ax.plot(mean)
ax.set_title(label)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel(y_label_unit, fontsize=16)
plt.savefig('error_mean_std.jpg')

Chunk - 2019 December 15¶
This training was performed without the decoder part of the Transformer, dividing training time by a factor 2.
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-4
EPOCHS = 20
TIME_CHUNK = True
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(netout, y.to(device))
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.04it/s, loss=0.0126]
[Epoch 2/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.04it/s, loss=0.00866]
[Epoch 3/20]: 100%|██████████| 7500/7500 [01:23<00:00, 89.89it/s, loss=0.00733]
[Epoch 4/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.20it/s, loss=0.00669]
[Epoch 5/20]: 100%|██████████| 7500/7500 [01:23<00:00, 90.16it/s, loss=0.00609]
[Epoch 6/20]: 100%|██████████| 7500/7500 [01:23<00:00, 90.12it/s, loss=0.00564]
[Epoch 7/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.97it/s, loss=0.00522]
[Epoch 8/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.62it/s, loss=0.00486]
[Epoch 9/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.81it/s, loss=0.00454]
[Epoch 10/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.81it/s, loss=0.0043]
[Epoch 11/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.53it/s, loss=0.00406]
[Epoch 12/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.67it/s, loss=0.00387]
[Epoch 13/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.37it/s, loss=0.00367]
[Epoch 14/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.58it/s, loss=0.0035]
[Epoch 15/20]: 100%|██████████| 7500/7500 [01:21<00:00, 92.01it/s, loss=0.00335]
[Epoch 16/20]: 100%|██████████| 7500/7500 [01:22<00:00, 91.18it/s, loss=0.00322]
[Epoch 17/20]: 100%|██████████| 7500/7500 [01:22<00:00, 90.49it/s, loss=0.00312]
[Epoch 18/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.95it/s, loss=0.00303]
[Epoch 19/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.58it/s, loss=0.00294]
[Epoch 20/20]: 100%|██████████| 7500/7500 [01:21<00:00, 91.80it/s, loss=0.00284]
Loss: 0.002845

Plot results sample¶
[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()
plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
# Select real temperature
y_true = y[:, idx_label]
y_pred = netout[0, :, idx_label].numpy()
plt.subplot(9, 1, idx_label+1)
# If consumption, rescale axis
if label.startswith('Q_'):
plt.ylim(-0.1, 1.1)
elif label == 'T_INT_OFFICE':
y_true = dataloader.dataset.rescale(y_true, idx_label)
y_pred = dataloader.dataset.rescale(y_pred, idx_label)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(label)
plt.legend()
# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Chunk - 2019 December 15¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 20
TIME_CHUNK = True
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(netout, y.to(device))
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.45it/s, loss=0.0155]
[Epoch 2/20]: 100%|██████████| 7500/7500 [03:07<00:00, 40.01it/s, loss=0.00893]
[Epoch 3/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.13it/s, loss=0.00693]
[Epoch 4/20]: 100%|██████████| 7500/7500 [03:06<00:00, 40.15it/s, loss=0.00596]
[Epoch 5/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00526]
[Epoch 6/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00458]
[Epoch 7/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00416]
[Epoch 8/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00386]
[Epoch 9/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00359]
[Epoch 10/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.51it/s, loss=0.00338]
[Epoch 11/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.0032]
[Epoch 12/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.41it/s, loss=0.00305]
[Epoch 13/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.46it/s, loss=0.00293]
[Epoch 14/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.42it/s, loss=0.00282]
[Epoch 15/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.48it/s, loss=0.00277]
[Epoch 16/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.34it/s, loss=0.00268]
[Epoch 17/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00264]
[Epoch 18/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.52it/s, loss=0.00258]
[Epoch 19/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.50it/s, loss=0.00254]
[Epoch 20/20]: 100%|██████████| 7500/7500 [03:05<00:00, 40.43it/s, loss=0.00247]
Loss: 0.002473

Plot results sample¶
[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()
plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
# Select real temperature
y_true = y[:, idx_label]
y_pred = netout[0, :, idx_label].numpy()
plt.subplot(9, 1, idx_label+1)
# If consumption, rescale axis
if label.startswith('Q_'):
plt.ylim(-0.1, 1.1)
elif label == 'T_INT_OFFICE':
y_true = dataloader.dataset.rescale(y_true, idx_label)
y_pred = dataloader.dataset.rescale(y_pred, idx_label)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(label)
plt.legend()
# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0].cpu()
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Benchmark - 2019 December 15¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns
from src.dataset import OzeDataset
from src.Benchmark import LSTM
[2]:
# Training parameters
DATASET_PATH = 'dataset_large.npz'
BATCH_SIZE = 4
NUM_WORKERS = 4
LR = 3e-3
EPOCHS = 20
TIME_CHUNK = True
# Model parameters
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTM(d_input, d_model, d_output, N).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x.to(device))
# Comupte loss
loss = loss_function(netout, y.to(device))
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(x.shape[0])
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/20]: 100%|██████████| 7500/7500 [00:16<00:00, 452.91it/s, loss=0.0142]
[Epoch 2/20]: 100%|██████████| 7500/7500 [00:16<00:00, 449.53it/s, loss=0.00813]
[Epoch 3/20]: 100%|██████████| 7500/7500 [00:17<00:00, 434.53it/s, loss=0.00724]
[Epoch 4/20]: 100%|██████████| 7500/7500 [00:16<00:00, 448.52it/s, loss=0.00693]
[Epoch 5/20]: 100%|██████████| 7500/7500 [00:16<00:00, 451.65it/s, loss=0.00671]
[Epoch 6/20]: 100%|██████████| 7500/7500 [00:16<00:00, 455.15it/s, loss=0.00653]
[Epoch 7/20]: 100%|██████████| 7500/7500 [00:17<00:00, 425.80it/s, loss=0.0064]
[Epoch 8/20]: 100%|██████████| 7500/7500 [00:17<00:00, 423.33it/s, loss=0.00628]
[Epoch 9/20]: 100%|██████████| 7500/7500 [00:17<00:00, 432.92it/s, loss=0.0062]
[Epoch 10/20]: 100%|██████████| 7500/7500 [00:17<00:00, 438.34it/s, loss=0.00606]
[Epoch 11/20]: 100%|██████████| 7500/7500 [00:17<00:00, 422.91it/s, loss=0.00595]
[Epoch 12/20]: 100%|██████████| 7500/7500 [00:17<00:00, 421.01it/s, loss=0.00583]
[Epoch 13/20]: 100%|██████████| 7500/7500 [00:16<00:00, 447.78it/s, loss=0.0057]
[Epoch 14/20]: 100%|██████████| 7500/7500 [00:17<00:00, 440.90it/s, loss=0.0055]
[Epoch 15/20]: 100%|██████████| 7500/7500 [00:16<00:00, 454.46it/s, loss=0.00538]
[Epoch 16/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.71it/s, loss=0.00524]
[Epoch 17/20]: 100%|██████████| 7500/7500 [00:16<00:00, 457.21it/s, loss=0.00516]
[Epoch 18/20]: 100%|██████████| 7500/7500 [00:16<00:00, 457.11it/s, loss=0.00507]
[Epoch 19/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.00it/s, loss=0.00499]
[Epoch 20/20]: 100%|██████████| 7500/7500 [00:16<00:00, 456.07it/s, loss=0.00488]
Loss: 0.004880

Plot results sample¶
[6]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
netout = net(torch.Tensor(x[np.newaxis, ...]).to(device)).cpu()
plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
# Select real temperature
y_true = y[:, idx_label]
y_pred = netout[0, :, idx_label].numpy()
plt.subplot(9, 1, idx_label+1)
# If consumption, rescale axis
if label.startswith('Q_'):
plt.ylim(-0.1, 1.1)
elif label == 'T_INT_OFFICE':
y_true = dataloader.dataset.rescale(y_true, idx_label)
y_pred = dataloader.dataset.rescale(y_pred, idx_label)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(label)
plt.legend()
# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()
plt.savefig("fig.jpg")

Chunk - 2019 December 06¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
DATASET_PATH = 'dataset.npz'
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = 'regular' # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [03:33<00:00, 4.68it/s, loss=0.0178]
[Epoch 2/5]: 100%|██████████| 1000/1000 [03:30<00:00, 4.74it/s, loss=0.0125]
[Epoch 3/5]: 100%|██████████| 1000/1000 [03:36<00:00, 4.62it/s, loss=0.0116]
[Epoch 4/5]: 100%|██████████| 1000/1000 [03:50<00:00, 4.34it/s, loss=0.0112]
[Epoch 5/5]: 100%|██████████| 1000/1000 [03:13<00:00, 5.16it/s, loss=0.0108]
Loss: 0.010834

Plot results sample¶
[8]:
# Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
netout = net(torch.Tensor(x[np.newaxis, ...]))
plt.figure(figsize=(30, 30))
for idx_label, label in enumerate(dataloader.dataset.labels['X']):
# Select real temperature
y_true = y[:, idx_label]
y_pred = netout[0, :, idx_label].numpy()
plt.subplot(9, 1, idx_label+1)
# If consumption, rescale axis
if label.startswith('Q_'):
plt.ylim(-0.1, 1.1)
elif label == 'T_INT_OFFICE':
y_true = dataloader.dataset.rescale(y_true, idx_label)
y_pred = dataloader.dataset.rescale(y_pred, idx_label)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(label)
plt.legend()
# Plot ambient temperature
plt.subplot(9, 1, idx_label+2)
t_amb = x[:, dataloader.dataset.labels["Z"].index("TAMB")]
t_amb = dataloader.dataset.rescale(t_amb, -1)
plt.plot(t_amb, label="TAMB", c="red")
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0]
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Chunk - 2019 December 06¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
DATASET_PATH = 'dataset.npz'
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = None # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset(DATASET_PATH),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [03:40<00:00, 4.53it/s, loss=0.0165]
[Epoch 2/5]: 100%|██████████| 1000/1000 [03:54<00:00, 4.26it/s, loss=0.012]
[Epoch 3/5]: 100%|██████████| 1000/1000 [03:50<00:00, 4.34it/s, loss=0.0116]
[Epoch 4/5]: 100%|██████████| 1000/1000 [03:46<00:00, 4.42it/s, loss=0.011]
[Epoch 5/5]: 100%|██████████| 1000/1000 [03:46<00:00, 4.41it/s, loss=0.0109]
Loss: 0.010939

Plot results sample¶
[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
x = torch.Tensor(x[np.newaxis, ...])
netout = net(x)
plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
# Select real temperature
y_true = y[:, idx_output_var]
y_pred = netout[0, :, idx_output_var]
y_pred = y_pred.numpy()
plt.subplot(8, 1, idx_output_var+1)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0]
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Chunk - 2019 December 04¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
pe = "regular" # Positional encoding
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK, pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [03:45<00:00, 4.44it/s, loss=0.0183]
[Epoch 2/5]: 100%|██████████| 1000/1000 [02:58<00:00, 5.60it/s, loss=0.0115]
[Epoch 3/5]: 100%|██████████| 1000/1000 [03:00<00:00, 5.55it/s, loss=0.0108]
[Epoch 4/5]: 100%|██████████| 1000/1000 [02:58<00:00, 5.59it/s, loss=0.0102]
[Epoch 5/5]: 100%|██████████| 1000/1000 [02:57<00:00, 5.63it/s, loss=0.0102]
Loss: 0.010186

Plot results sample¶
[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
x = torch.Tensor(x[np.newaxis, ...])
netout = net(x)
plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
# Select real temperature
y_true = y[:, idx_output_var]
y_pred = netout[0, :, idx_output_var]
y_pred = y_pred.numpy()
plt.subplot(8, 1, idx_output_var+1)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0]
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Benchmark - 2019 December 03¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.LSTM import LSTMBenchmark
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = LSTMBenchmark(input_dim=d_input, hidden_dim=d_model, output_dim=d_output, num_layers=N)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [00:16<00:00, 62.40it/s, loss=0.0218]
[Epoch 2/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.13it/s, loss=0.0145]
[Epoch 3/5]: 100%|██████████| 1000/1000 [00:14<00:00, 68.86it/s, loss=0.0132]
[Epoch 4/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.17it/s, loss=0.0107]
[Epoch 5/5]: 100%|██████████| 1000/1000 [00:15<00:00, 66.18it/s, loss=0.0103]
Loss: 0.010313

Plot results sample¶
[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
x = torch.Tensor(x[np.newaxis, ...])
netout = net(x)
plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
# Select real temperature
y_true = y[:, idx_output_var]
y_pred = netout[0, :, idx_output_var]
y_pred = y_pred.numpy()
plt.subplot(8, 1, idx_output_var+1)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")

Chunk - 2019 December 03¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = True
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [03:05<00:00, 5.39it/s, loss=0.0205]
[Epoch 2/5]: 100%|██████████| 1000/1000 [03:00<00:00, 5.55it/s, loss=0.012]
[Epoch 3/5]: 100%|██████████| 1000/1000 [02:59<00:00, 5.56it/s, loss=0.0108]
[Epoch 4/5]: 100%|██████████| 1000/1000 [03:00<00:00, 5.55it/s, loss=0.0105]
[Epoch 5/5]: 100%|██████████| 1000/1000 [02:59<00:00, 5.57it/s, loss=0.0102]
Loss: 0.010207

Plot results sample¶
[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
x = torch.Tensor(x[np.newaxis, ...])
netout = net(x)
plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
# Select real temperature
y_true = y[:, idx_output_var]
y_pred = netout[0, :, idx_output_var]
y_pred = y_pred.numpy()
plt.subplot(8, 1, idx_output_var+1)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0]
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")

Classic - 2019 December 03¶
[1]:
import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns; sns.set()
from src.dataset import OzeDataset
from src.Transformer import Transformer
[2]:
BATCH_SIZE = 2
NUM_WORKERS = 4
LR = 1e-2
EPOCHS = 5
TIME_CHUNK = False
K = 672 # Time window length
d_model = 48 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
d_input = 37 # From dataset
d_output = 8 # From dataset
Load dataset¶
[3]:
dataloader = DataLoader(OzeDataset("dataset.npz"),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS
)
Load network¶
[4]:
# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, K, N, TIME_CHUNK)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()
Train¶
[5]:
# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
running_loss = 0
with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
for idx_batch, (x, y) in enumerate(dataloader):
optimizer.zero_grad()
# Propagate input
netout = net(x)
# Comupte loss
loss = loss_function(netout, y)
# Backpropage loss
loss.backward()
# Update weights
optimizer.step()
running_loss += loss.item()
pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
pbar.update(BATCH_SIZE)
hist_loss[idx_epoch] = running_loss/len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")
[Epoch 1/5]: 100%|██████████| 1000/1000 [05:35<00:00, 2.98it/s, loss=0.019]
[Epoch 2/5]: 100%|██████████| 1000/1000 [05:55<00:00, 2.81it/s, loss=0.0126]
[Epoch 3/5]: 100%|██████████| 1000/1000 [05:33<00:00, 3.00it/s, loss=0.0115]
[Epoch 4/5]: 100%|██████████| 1000/1000 [05:23<00:00, 3.09it/s, loss=0.0108]
[Epoch 5/5]: 100%|██████████| 1000/1000 [05:21<00:00, 3.11it/s, loss=0.0103]
Loss: 0.010339

Plot results sample¶
[6]:
## Select training example
idx = np.random.randint(0, len(dataloader.dataset))
x, y = dataloader.dataset[idx]
# Run predictions
with torch.no_grad():
x = torch.Tensor(x[np.newaxis, ...])
netout = net(x)
plt.figure(figsize=(30, 30))
for idx_output_var in range(8):
# Select real temperature
y_true = y[:, idx_output_var]
y_pred = netout[0, :, idx_output_var]
y_pred = y_pred.numpy()
plt.subplot(8, 1, idx_output_var+1)
plt.plot(y_true, label="Truth")
plt.plot(y_pred, label="Prediction")
plt.title(dataloader.dataset.labels["X"][idx_output_var])
plt.legend()
plt.savefig("fig.jpg")

Display encoding attention map¶
[7]:
# Select first encoding layer
encoder = net.layers_encoding[0]
# Get the first attention map
attn_map = encoder.attention_map[0]
# Plot
plt.figure(figsize=(20, 20))
sns.heatmap(attn_map)
plt.savefig("attention_map.jpg")
