You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Automatic vectorization of copying doesn't account for shape divisibility. When attempting to copy a tensor with the layout (_2,_3):(_1, _2), the greatest common vector length is 6. However, it's vectorized at 128 bits, which means copying four elements at once. This approach doesn't work for a tensor with a size of 6.
Steps/Code to reproduce bug
#include<vector>
#include<cute/tensor.hpp>
#include<cute/layout.hpp>usingnamespacecute;intmain() {
auto mem_layout = make_layout(make_shape(Int<2>{}, Int<3>{}));
print_layout(mem_layout);
std::vector<int> src_buffer(size(mem_layout));
std::vector<int> dst_buffer(size(mem_layout));
auto src = make_tensor(src_buffer.data(), mem_layout);
for (int t = 0; t < size(mem_layout); t++) {
src[t] = t;
}
print_tensor(src);
auto dst = make_tensor(dst_buffer.data(), mem_layout);
copy(src, dst);
print_tensor(dst);
return0;
}
I got the following error:
error: static assertion failed due to requirement 'C<3>::value % C<2>::value == 0 || C<2>::value % C<3>::value == 0': Static shape_div failure
405 | static_assert(IntTupleA::value % IntTupleB::value == 0 || IntTupleB::value % IntTupleA::value == 0, "Static shape_div failure");
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/ethan/cutlass/include/cute/layout.hpp:1680:24: note: in instantiation of function template specialization 'cute::shape_div<cute::C<3>, cute::C<2>>' requested here
1680 | return make_layout(shape_div(shape, shape_div(Int<N>{}, abs(stride))),
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:86: note: in instantiation of function template specialization 'cute::upcast<4, cute::C<3>, cute::C<2>>' requested here
1676 | return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
| ^
/home/ethan/cutlass/include/cute/layout.hpp:704:22: note: in instantiation of function template specialization 'cute::upcast(const cute::tuple<cute::C<2>, cute::C<3>> &, const cute::tuple<cute::C<1>, cute::C<2>> &)::(anonymous class)::operator()<cute::C<3>, cute::C<2>>' requested here
704 | return make_layout(f(get<I>(t0),get<I>(t1))..., get<I0>(t0)..., get<I1>(t1)...);
| ^
/home/ethan/cutlass/include/cute/layout.hpp:725:18: note: in instantiation of function template specialization 'cute::detail::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44) &, 0, 1>' requested here
725 | return detail::transform_layout(t0, t1, f, make_seq<R>{}, make_range<R,R0>{}, make_range<R,R1>{});
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:12: note: in instantiation of function template specialization 'cute::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44)>' requested here
1676 | return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1696:10: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
1696 | return upcast<N>(layout.shape(), layout.stride());
| ^
/home/ethan/cutlass/include/cute/tensor.hpp:658:21: note: in instantiation of function template specialization 'cute::recast_layout<int, const cutlass::uint128_t, cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>' requested here
658 | auto new_layout = recast_layout<OldType,NewType>(old_layout);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:210:20: note: in instantiation of function template specialization 'cute::recast<const cutlass::uint128_t, const cute::Tensor<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>> &>' requested here
210 | Tensor src_v = recast<SrcVecType>(src);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:283:12: note: in instantiation of function template specialization 'cute::copy_vec<cutlass::uint128_t, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
283 | return copy_vec<uint_bit_t<vec_bits>>(src, dst);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:297:10: note: in instantiation of function template specialization 'cute::copy<8, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
297 | return copy(AutoVectorizingCopy{}, src, dst);
| ^
test.cpp:24:5: note: in instantiation of function template specialization 'cute::copy<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
24 | copy(src, dst);
| ^
Expected behavior
copy(src, dst) should just work. Internally, it should be able to compute a correct predicate and use that to do the copy.
Environment details (please complete the following information):
Environment location: [Bare-metal]
The text was updated successfully, but these errors were encountered:
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Describe the bug
Automatic vectorization of copying doesn't account for shape divisibility. When attempting to copy a tensor with the layout (_2,_3):(_1, _2), the greatest common vector length is 6. However, it's vectorized at 128 bits, which means copying four elements at once. This approach doesn't work for a tensor with a size of 6.
Steps/Code to reproduce bug
I got the following error:
Expected behavior
copy(src, dst)
should just work. Internally, it should be able to compute a correct predicate and use that to do the copy.Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: