Feature/transformer refactorisation #1915

JanFidor · 2023-07-23T13:39:56Z

Summary

Added teacher forcing to the transformer model
Added auto-regression for inference
Changed the normalization factor to math.sqrt(self.d_model) (I tested deleting it altogether, but SunspotsDataset backtest MAPE was almost 2 times higher then)

Other Information

One of the tricky parts was deciding on how to use encoders -> channel dimension for source consists of both (predicted_series and past covariates), while target only has the predicted_series. I decided to use 0 padding to substitute for the missing past covariates and use the same Linear layer for both. It seemed to give a little better results and is more in line with the original implementation.

When it comes to probabilistic forecasting on inference, I decided to just take an average over probabilistic dimension

…use d_model as normalization factor

JanFidor · 2023-07-24T16:57:56Z

Hi @dennisbader , I've double checked the failing tests and everything seems to be running fine on my end locally. It also looks like the errors are cause by some weird pytorch interactions and I'm not 100% sure if the code is at fault or if CI/CD is flaky

dennisbader · 2023-07-31T07:40:25Z

Hi @JanFidor and thanks for this PR. We'll have time to review next week.
For the meantime, the other unit test workflows run fine, so there is probably an issue with the changes in this PR.

review-notebook-app · 2023-08-07T12:25:31Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

# Conflicts: # darts/models/forecasting/transformer_model.py

JanFidor · 2023-08-07T15:47:17Z

Quick update @dennisbader, I fixed the bug (transformer mask generation was creating tensor incorrectly) and tests are passing now, but it seems like codecov has an a connection problem. (Also, I had a tiny adventure with git history and some unpleasant rebases, but fortunately force-pushing and git reset saved the day 🚀 )

And for a neat conclusion, here's a change in the SunspotsDataset backtest performance (dataset from example notebook with a longer forecast horizon, which is where teacher-forcing should make a difference)

Old implementation performance:

New implementation performance:

codecov-commenter · 2023-08-07T16:22:54Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (a8a094a) 93.87% compared to head (fa8a152) 93.88%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1915   +/-   ##
=======================================
  Coverage   93.87%   93.88%           
=======================================
  Files         132      132           
  Lines       12673    12692   +19     
=======================================
+ Hits        11897    11916   +19     
  Misses        776      776

Files Changed	Coverage Δ
darts/models/forecasting/transformer_model.py	`99.22% <100.00%> (+0.26%)`	⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

madtoinou

Nice work @JanFidor 🚀 (and sorry for the delay)!

I wrote some high level comments before testing the model locally, let me know what you think.

I am not sure to understand your comment about past_covariates being unavailable for the start token, would you mind detailing this a bit?

darts/models/forecasting/transformer_model.py

madtoinou · 2023-08-22T09:55:06Z

darts/models/forecasting/transformer_model.py

+        data = x_in[0]
+        pad_size = (0, self.input_size - self.target_size)
+
+        # start token consists only of target series, past covariates are substituted with 0 padding


Why would you not include the past covariates in the start token?

darts/models/forecasting/transformer_model.py

JanFidor · 2023-08-27T11:22:37Z

Thanks for the review @madtoinou ! About the covariates, as I understand it, they're expected to cover the same range of timestamps as the past_series (timestamps 1:L where L is input_chunk_length). It's not a problem for the TransformerEncoder input, but the TransformerDecoder requires timestamps L:L+H-1 where H is output_chunk_length. I couldn't find a way to access past covariates for L+1:L+H-1. I decided to drop the past_covariates values for the start token to generalize endocing input for TransformerDecoder-> only target channels have non-zero values. I was worried that otherwise encoding for TransformerDecoder could be unstable. For horizon of 2, first token would have diverse covariates, while second one's would be zero. I thought about extending the covariates of first token to the rest of target_series, but I was worried that they would take away from the importance of target_series changes.

madtoinou

Thanks for adjusting the code.

Before approving, I would like to run some checks to make sure that there is no information leakage and tweak the default parameters (especially for the failing unit-test, it would be great to reach the same accuracy as before).

madtoinou · 2023-09-19T15:19:23Z

darts/models/forecasting/transformer_model.py

@@ -8,6 +8,8 @@

 import torch
 import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn import Transformer


maybe import just generate_square_subsequent_mask

After reading up a little and checking out the implementation, it turns out that generate_square_subsequent_mask is a static method of Transformer. While it is possible to import just it (https://stackoverflow.com/questions/48178011/import-static-method-of-a-class-without-importing-the-whole-class) I don't think it's worth it. That said, I definitely agree that this import is a little intuitive and I think that a nice middle ground would be adding an implementation of generate_square_subsequent_mask to darts/utils as it's a very small function. What do you think?

WDYT @dennisbader ?

darts/models/forecasting/transformer_model.py

madtoinou · 2023-09-19T15:32:25Z

darts/models/forecasting/transformer_model.py

@@ -349,7 +410,7 @@ def __init__(
        The multi-head attention mechanism is highly parallelizable, which makes the transformer architecture
        very suitable to be trained with GPUs.

-        The transformer architecture implemented here is based on [1]_.
+        The transformer architecture implemented here is based on [1]_ and uses teacher forcing [2]_.


I think that I removed the reference entry by accident, can you please put it back? Instead of the toward data science article, could you please link the torch tutorial : https://github.com/pytorch/examples/tree/main/word_language_model

# Conflicts: # darts/models/forecasting/transformer_model.py

…ion' into feature/transformer-refactorisation

JanFidor · 2023-09-28T19:27:25Z

Thanks for another review and more great suggestions for improvement @madtoinou ! I'll try to play around with hyperparameters, but after reading the paper on DeepAR (https://arxiv.org/abs/1704.04110) I found a different approach to probabilistic forecasting which might help with the decrease in accuracy. DeepAR uses ancestral sampling (MonteCarlo simulation to generate each of the samples) (Section 3.1 last paragraph + Figure 2). It would require some changes to the implementation but I think the end result would be more expressive if the probability distribution of the errors is known beforehand. Do you think it would be worth it considering the increase in the scope of the PR and time complexity?

madtoinou · 2023-10-20T08:45:35Z

Hi @JanFidor,

I would recommend keeping the implementation of the ancestral sampling in the TransformerModel for another PR so that we can more easily compare the change in performance and simplify the review process. Feel free to open an issue to track this improvement!

madtoinou · 2023-10-20T08:47:59Z

darts/models/forecasting/transformer_model.py

@@ -410,7 +409,7 @@ def __init__(
        The multi-head attention mechanism is highly parallelizable, which makes the transformer architecture
        very suitable to be trained with GPUs.

-        The transformer architecture implemented here is based on [1]_ and uses teacher forcing [2]_.
+        The transformer architecture implemented here is based on [1]_ abd uses teacher forcing [4]_.


…er-refactorisation

JanFidor added 4 commits July 22, 2023 22:48

add teacher forcing to transformer model

084541f

change encoder to the original implementation (the best metrics) and …

492c3e5

…use d_model as normalization factor

add 0 padding to targets, to compensate for missing past covariates

30ccd78

add comments

fd13830

JanFidor requested a review from dennisbader as a code owner July 23, 2023 13:39

JanFidor force-pushed the feature/transformer-refactorisation branch from f82ca1c to fd13830 Compare August 7, 2023 13:04

JanFidor added 4 commits August 7, 2023 15:05

Merge branch 'master' into feature/transformer-refactorisation

35e2f5b

# Conflicts: # darts/models/forecasting/transformer_model.py

switch to .device instead of get_device()

9eecccd

fix after merge

297afb9

small refactor

fa8a152

madtoinou requested changes Aug 22, 2023

View reviewed changes

JanFidor added 2 commits August 25, 2023 23:13

Merge branch 'master' into feature/transformer-refactorisation

84e8d4a

refactor + update docstrings

fa65ed2

JanFidor mentioned this pull request Aug 27, 2023

Feature/multivariate wrapper #1917

Open

Merge branch 'master' into feature/transformer-refactorisation

a136e4c

madtoinou reviewed Sep 19, 2023

View reviewed changes

JanFidor added 3 commits September 28, 2023 21:04

Merge branch 'master' into feature/transformer-refactorisation

87e0a44

# Conflicts: # darts/models/forecasting/transformer_model.py

Merge remote-tracking branch 'origin/feature/transformer-refactorisat…

3bcbb30

…ion' into feature/transformer-refactorisation

update docstring

a370a5e

fix missing docstring info + add layer norm as default

634294d

madtoinou reviewed Oct 20, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature/transform…

0bfde7c

…er-refactorisation

fix a typo in docstring

0aec5f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/transformer refactorisation #1915

Feature/transformer refactorisation #1915

JanFidor commented Jul 23, 2023 •

edited

JanFidor commented Jul 24, 2023

dennisbader commented Jul 31, 2023

review-notebook-app bot commented Aug 7, 2023

JanFidor commented Aug 7, 2023

codecov-commenter commented Aug 7, 2023 •

edited

madtoinou left a comment

madtoinou Aug 22, 2023

JanFidor commented Aug 27, 2023 •

edited

madtoinou left a comment

madtoinou Sep 19, 2023

JanFidor Sep 28, 2023

madtoinou Oct 20, 2023

madtoinou Sep 19, 2023

JanFidor commented Sep 28, 2023 •

edited

madtoinou commented Oct 20, 2023

madtoinou Oct 20, 2023

Feature/transformer refactorisation #1915

Are you sure you want to change the base?

Feature/transformer refactorisation #1915

Conversation

JanFidor commented Jul 23, 2023 • edited

Summary

Other Information

JanFidor commented Jul 24, 2023

dennisbader commented Jul 31, 2023

review-notebook-app bot commented Aug 7, 2023

JanFidor commented Aug 7, 2023

codecov-commenter commented Aug 7, 2023 • edited

Codecov Report

madtoinou left a comment

Choose a reason for hiding this comment

madtoinou Aug 22, 2023

Choose a reason for hiding this comment

JanFidor commented Aug 27, 2023 • edited

madtoinou left a comment

Choose a reason for hiding this comment

madtoinou Sep 19, 2023

Choose a reason for hiding this comment

JanFidor Sep 28, 2023

Choose a reason for hiding this comment

madtoinou Oct 20, 2023

Choose a reason for hiding this comment

madtoinou Sep 19, 2023

Choose a reason for hiding this comment

JanFidor commented Sep 28, 2023 • edited

madtoinou commented Oct 20, 2023

madtoinou Oct 20, 2023

Choose a reason for hiding this comment

JanFidor commented Jul 23, 2023 •

edited

codecov-commenter commented Aug 7, 2023 •

edited

JanFidor commented Aug 27, 2023 •

edited

JanFidor commented Sep 28, 2023 •

edited