Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Autovectorized copy fails with shape (_2, _3) #1499

Open
YichengDWu opened this issue Apr 23, 2024 · 2 comments
Open

[BUG] Autovectorized copy fails with shape (_2, _3) #1499

YichengDWu opened this issue Apr 23, 2024 · 2 comments
Labels
bug Something isn't working inactive-30d

Comments

@YichengDWu
Copy link

YichengDWu commented Apr 23, 2024

Describe the bug

Automatic vectorization of copying doesn't account for shape divisibility. When attempting to copy a tensor with the layout (_2,_3):(_1, _2), the greatest common vector length is 6. However, it's vectorized at 128 bits, which means copying four elements at once. This approach doesn't work for a tensor with a size of 6.

Steps/Code to reproduce bug

#include <vector>

#include <cute/tensor.hpp>
#include <cute/layout.hpp>

using namespace cute;

int main() {
    auto mem_layout = make_layout(make_shape(Int<2>{}, Int<3>{}));
    print_layout(mem_layout);
    
    std::vector<int> src_buffer(size(mem_layout));
    std::vector<int> dst_buffer(size(mem_layout));
    
    auto src = make_tensor(src_buffer.data(), mem_layout);

    for (int t = 0; t < size(mem_layout); t++) {
      src[t] = t;
    
    }
    print_tensor(src);

    auto dst = make_tensor(dst_buffer.data(), mem_layout);
    copy(src, dst);
    print_tensor(dst);
  
  return 0;
}

I got the following error:

error: static assertion failed due to requirement 'C<3>::value % C<2>::value == 0 || C<2>::value % C<3>::value == 0': Static shape_div failure
  405 |     static_assert(IntTupleA::value % IntTupleB::value == 0 || IntTupleB::value % IntTupleA::value == 0, "Static shape_div failure");
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/ethan/cutlass/include/cute/layout.hpp:1680:24: note: in instantiation of function template specialization 'cute::shape_div<cute::C<3>, cute::C<2>>' requested here
 1680 |     return make_layout(shape_div(shape,  shape_div(Int<N>{}, abs(stride))),
      |                        ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:86: note: in instantiation of function template specialization 'cute::upcast<4, cute::C<3>, cute::C<2>>' requested here
 1676 |     return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
      |                                                                                      ^
/home/ethan/cutlass/include/cute/layout.hpp:704:22: note: in instantiation of function template specialization 'cute::upcast(const cute::tuple<cute::C<2>, cute::C<3>> &, const cute::tuple<cute::C<1>, cute::C<2>> &)::(anonymous class)::operator()<cute::C<3>, cute::C<2>>' requested here
  704 |   return make_layout(f(get<I>(t0),get<I>(t1))..., get<I0>(t0)..., get<I1>(t1)...);
      |                      ^
/home/ethan/cutlass/include/cute/layout.hpp:725:18: note: in instantiation of function template specialization 'cute::detail::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44) &, 0, 1>' requested here
  725 |   return detail::transform_layout(t0, t1, f, make_seq<R>{}, make_range<R,R0>{}, make_range<R,R1>{});
      |                  ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:12: note: in instantiation of function template specialization 'cute::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44)>' requested here
 1676 |     return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
      |            ^
/home/ethan/cutlass/include/cute/layout.hpp:1696:10: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
 1696 |   return upcast<N>(layout.shape(), layout.stride());
      |          ^
/home/ethan/cutlass/include/cute/tensor.hpp:658:21: note: in instantiation of function template specialization 'cute::recast_layout<int, const cutlass::uint128_t, cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>' requested here
  658 |   auto new_layout = recast_layout<OldType,NewType>(old_layout);
      |                     ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:210:20: note: in instantiation of function template specialization 'cute::recast<const cutlass::uint128_t, const cute::Tensor<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>> &>' requested here
  210 |     Tensor src_v = recast<SrcVecType>(src);
      |                    ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:283:12: note: in instantiation of function template specialization 'cute::copy_vec<cutlass::uint128_t, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
  283 |     return copy_vec<uint_bit_t<vec_bits>>(src, dst);
      |            ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:297:10: note: in instantiation of function template specialization 'cute::copy<8, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
  297 |   return copy(AutoVectorizingCopy{}, src, dst);
      |          ^
test.cpp:24:5: note: in instantiation of function template specialization 'cute::copy<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
   24 |     copy(src, dst);
      |     ^

Expected behavior

copy(src, dst) should just work. Internally, it should be able to compute a correct predicate and use that to do the copy.

Environment details (please complete the following information):

  • Environment location: [Bare-metal]
@YichengDWu YichengDWu added ? - Needs Triage bug Something isn't working labels Apr 23, 2024
@YichengDWu YichengDWu changed the title [BUG] Autovectorized copy fails with shape (2, 3) [BUG] Autovectorized copy fails with shape (_2, _3) Apr 23, 2024
@thakkarV
Copy link
Collaborator

We have found similar issues internally with "domain alignment" of copies. This is being worked on right now

Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inactive-30d
Projects
None yet
Development

No branches or pull requests

3 participants