Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set up the correct combination of shift_div and frame_count? #223

Open
qwangku opened this issue Aug 7, 2022 · 1 comment
Open

Comments

@qwangku
Copy link

qwangku commented Aug 7, 2022

Thanks for sharing this great resources. I am trying to play with different frame rates for TSM.
I noticed there are 3 important attributes here: frame_count, num_segments and shift_div.

For example, if I reduced frame_count from 8 to 4 (which means the video is split into 4 segments this time, so the equivalent frame rate is reduced), should I also adjust "shift_div" and "num_segments"? Am I right to say "shift_div" should always be equal or smaller than "frame_count"?

@yjang43
Copy link

yjang43 commented Sep 4, 2022

I don't know if you are still looking for an answer. I was also looking at the code to get an answer for this question, and I end up to this part of code which I believe answers the question.

    @staticmethod
    def shift(x, n_segment, fold_div=3, inplace=False):
        nt, c, h, w = x.size()
        n_batch = nt // n_segment
        x = x.view(n_batch, n_segment, c, h, w)

        fold = c // fold_div
        if inplace:
            # Due to some out of order error when performing parallel computing. 
            # May need to write a CUDA kernel.
            raise NotImplementedError  
            # out = InplaceShift.apply(x, fold)
        else:
            out = torch.zeros_like(x)
            out[:, :-1, :fold] = x[:, 1:, :fold]  # shift left
            out[:, 1:, fold: 2 * fold] = x[:, :-1, fold: 2 * fold]  # shift right
            out[:, :, 2 * fold:] = x[:, :, 2 * fold:]  # not shift

        return out.view(nt, c, h, w)

fold_div is equal to shift_div. If it is set to 3, then 2 / 3 of the channels will be shifted. If set to 8, then 2 / 8. I am studying this code as well, so please take this with a grain of salt 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants