Add example notebook for multi time-series regression #997

ottonemo · 2023-07-26T12:47:41Z

This was motivated by this thread on the PyTorch discussion forums. It might be a good reference for more advanced usage.

This was motivated by [this][1] thread on the PyTorch discussion forums. It might be a good reference for more advanced usage. [1]: https://discuss.pytorch.org/t/hyperparameter-search-for-multioutputregression/73404/6

review-notebook-app · 2023-07-26T12:47:44Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenjaminBossan

Thanks for adding this example. I like the amount of work put into explaining the individual steps, and especially with the analogy you provide. Working with time series is always tricky, so having an example explicitly for that is great.

At the start of the notebook, could you add the buttons to run it on colab and view the source? Same as the other notebooks have.

I think that the main story of the notebook could be simplified. E.g. how important is it that we have multioutput (i.e. y=X² and y=X³)? Going to your pizza example, would the 2nd y be the number of ordered drinks? Maybe the whole message can be simplified by dropping the multioutput part, instead focusing on multi time series and the cross-validation part?

y = np.hstack([X2, X3]).astype(np.float32)

I wonder if a sinusoidal shape for y wouldn't be better.

but you will lose the per-restaurant information that day 3 may depend on day 2 and so on

This one is not quite clear to me. Could you please expand the explanation?

In general, I would assume that most users who deal with time-series would like to input past sales as a feature to predict future sales (probably in an auto-regressive fashion, but that depends). This does not appear to be the case here, as X does not contain any features from y[t-1] etc. Maybe that's not the point of the exercise, but it did leave me a bit confused.

regressor = skorch.toy.make_regressor(input_units=1, output_units=y.shape[1], hidden_units=10)

For these types of examples, I prefer to explicitly define the module instead of using toy. It's just easier to understand for the user what's going on.

timeful multi regression
getX = lambda i: np.linspace(0+i, 10+i, 10)[None, :, None].astype(np.float32)
X = np.vstack([getX(i) for i in range(10)])

I'm a little confused by this. Perhaps it would help if we didn't have two dimensions of size 10? One is time, the other the number of restaurants?

Also:

>>> X[0, :, 0].tolist()
[0.0,
 1.1111111640930176,
 2.222222328186035,
 3.3333332538604736,
 4.44444465637207,
 5.55555534362793,
 6.666666507720947,
 7.777777671813965,
 8.88888931274414,
 10.0]

Not sure if this is intended to be [0, 1, 2, ...]. Doesn't really matter for the example but I wanted to bring it up.

And there is also a different way to create such a grid without custom functions:

nx, ny = (11, 11)
x = np.linspace(0, 10, nx)
y = np.linspace(0, 10, ny)
xv, yv = np.meshgrid(x, y)
X = xv + yv

assert X.shape == (10, 10, 1)

What is the purpose for the dangling dimension?

plot_time_series

Could you please add some labels to the plots?

plot_time_series(X, y_pred)

It would be nice to also show y but I guess it's not practical because of the y-axis scale.

list(ts_splitter.split(X, y))

I think a visualization like this can be helpful:

_, ax = plt.subplots(figsize=(6, 3))
for i, (xi, yi) in enumerate(ts_splitter.split(X, y)):
    ax.plot(xi, i * np.ones_like(xi), marker='o', c='c', label="train")
    ax.plot(yi, i * np.ones_like(yi), marker='o', c='b', label="valid")
    if i == 0:
        ax.legend()
ax.set_yticks([0, 1, 2])
ax.set_ylabel("split");

Using the time-series split for internal validation is easy.

I have learned that one should avoid terms like "easy" to not intimidate users for whom it's not easy. So maybe something like "can be achieved in a few steps" or so.

train_split=ValidSplit(cv=ts_splitter)

Hmm, I wonder: Since skorch takes only the first split, does it mean that a bunch of data is discarded completely, because the first ts split only goes from 0-5?

The default scorer for regression problems in sklearn is R² ...

I feel like the whole discussion around R² is a bit of a distraction from the main point. Would it be better just not to mention it and just go straight to MSE for instance?

cv_results

Maybe add a few sentences about what we can learn from the CV results before going straight to the conclusion?

We saw that it is possible and easy to use RNN and skorch

Again, let's avoid "easy".

BenjaminBossan · 2023-07-27T09:19:25Z

CHANGES.md

@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - Add the option to globally override the use of caching in scoring callbacks on the net by setting the `use_caching` argument on the net (this overrides the settings of individual callbacks)
 - Add support for saving and loading parameters with [safetensors](https://github.com/huggingface/safetensors/); use `net.save_params(..., use_safetensors=True)` and `net.load_params(..., use_safetensors=True)` (requires to install the `safetensors` library)
+- Example notebook for [RNNs and multiple time-series data](./notebooks/Multiple_time-series_RNN.ipynb)


Oh, didn't know that relative paths work.

Add example notebook for multi time-series regression

fe937e6

This was motivated by [this][1] thread on the PyTorch discussion forums. It might be a good reference for more advanced usage. [1]: https://discuss.pytorch.org/t/hyperparameter-search-for-multioutputregression/73404/6

ottonemo requested a review from BenjaminBossan July 26, 2023 12:47

BenjaminBossan requested changes Jul 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example notebook for multi time-series regression #997

Add example notebook for multi time-series regression #997

ottonemo commented Jul 26, 2023

review-notebook-app bot commented Jul 26, 2023

BenjaminBossan left a comment

BenjaminBossan Jul 27, 2023

Add example notebook for multi time-series regression #997

Are you sure you want to change the base?

Add example notebook for multi time-series regression #997

Conversation

ottonemo commented Jul 26, 2023

review-notebook-app bot commented Jul 26, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Jul 27, 2023

Choose a reason for hiding this comment