Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise #30750

Closed
jhzsquared opened this issue May 10, 2024 · 7 comments

Comments

@jhzsquared
Copy link

Possible solutions: Should shift=0, at L2020?
Referencing: https://github.com/huggingface/transformers/blame/4fdf58afb72b0754da30037fc800b6044e7d9c99/src/transformers/models/informer/modeling_informer.py#L2020

I've trained/tested an Informer model but when generating the prediction, run into a "RuntimeError: Sizes of tensors must match except in Dimension 2..." when running line 2029 in modeling_informer.py.

I broke it apart a bit, and after playing around, it looks like shift=1 in Line 2020 may have been mistakenly hardcoded? Otherwise, the tensor shape for reshaped_lagged_sequence at dimension 1 will always be one less than repeated_features.
Alternatively of course, repeated_features could not use k+1 in L2026. I'm not clear on the author's intuition behind the shift vs not.

@cw235

This comment was marked as spam.

@cw235

This comment was marked as spam.

@amyeroberts
Copy link
Collaborator

cc @kashif

@kashif
Copy link
Contributor

kashif commented May 13, 2024

@jhzsquared so the intention was that the model is learning the next step's distribution given the past as well as the covariates up till the time step at which one is forecasting...

can you paste in your lag_seq vectors that you are using?

@jhzsquared
Copy link
Author

I'm not using any lag right now, so have an initial model input lags_sequence = [0] .

And thanks! Conceptually that makes sense... functionally though, when k=0, the get_lagged_subsequences function with shift=1 is manifesting as a context_length size tensor at dimension 1, while the repeated_features[:,k+1] is always size k+1 at dimension 1 of course. When k>0, then the lagged_sequence shape at dimension 1 is always 1 less (-1) the size of the corresponding subset of repeated_features it is supposed to be combined with at Line 2026.

@kashif
Copy link
Contributor

kashif commented May 13, 2024

right so if you dont want lags set that array to [1], and increase you context lengh by 1 more time step...can you check if that works?

@jhzsquared
Copy link
Author

Ohh okay did not realize that should have been [1]. That fixed it! Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants