Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReshapedArray indexing broken because of Int128 operation #332

Open
kmp5VT opened this issue Apr 12, 2024 · 11 comments
Open

ReshapedArray indexing broken because of Int128 operation #332

kmp5VT opened this issue Apr 12, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@kmp5VT
Copy link

kmp5VT commented Apr 12, 2024

Hi

I am running the following code and am finding a internal compiler error

using Metal
dev = Metal.mtl
x = @view reshape(dev(randn(elt, 8, 8))', 64)[1:8]
@allowscalar y = copy(x)
Compilation to native code failed; see below for details.
If you think this is a bug, please file an issue and attach /tmp/jl_y9AnAdkNGZ.metallib

I have the temp file available but cannot attach it to the github issue

Thanks!

@christiangnrd
Copy link
Contributor

What type is elt?

@kmp5VT
Copy link
Author

kmp5VT commented Apr 13, 2024

@christiangnrd elt = Float32 sorry forgot to add that definition. Thanks!

@maleadt
Copy link
Member

maleadt commented Apr 15, 2024

I have the temp file available but cannot attach it to the github issue

You probably have to zip it.

Also, which version of Metal.jl are you using? Please ensure you're trying v1.1.0.

@kmp5VT
Copy link
Author

kmp5VT commented Apr 15, 2024

@maleadt sorry I didn't provide adequate versioning information. I am using Metal v 1.1.0. but I did not have an issue with this code in the previous release of Metal. Here is my versioninfo

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

Thanks!
metal_error.zip

@tgymnich
Copy link
Member

tgymnich commented Apr 15, 2024

; ModuleID = 'shader.air'
source_filename = "start"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx14.4.1"

; Function Attrs: cold noreturn nounwind
declare void @llvm.trap() #0

declare i64 @air.abs.s.i64(i64) local_unnamed_addr

define internal fastcc void @gpu_report_exception() unnamed_addr !dbg !58 {
top:
  ret void, !dbg !61
}

define internal fastcc void @gpu_signal_exception() unnamed_addr !dbg !62 {
top:
  ret void, !dbg !64
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i64 @llvm.smax.i64(i64, i64) #1

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i8 @llvm.umin.i8(i8, i8) #1

define void @_Z15getindex_kernel16mtlKernelContext14MtlDeviceArrayI7Float32Li1ELi1EE13ReshapedArrayIS1_Li1E7AdjointIS1_S0_IS1_Li2ELi1EEE5TupleI27SignedMultiplicativeInverseI5Int64EEES4_IS6_E9UnitRangeIS6_E({ i8 addrspace(1)*, [1 x i64] } addrspace(1)* %0, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1, [1 x i64] addrspace(1)* %2, [2 x i64] addrspace(1)* %3, i32 %threads_per_grid, i32 %thread_position_in_grid) local_unnamed_addr !dbg !65 {
conversion:
  %4 = bitcast { i8 addrspace(1)*, [1 x i64] } addrspace(1)* %0 to float addrspace(1)* addrspace(1)*
  %.unpack12 = load float addrspace(1)*, float addrspace(1)* addrspace(1)* %4, align 8
  %5 = getelementptr inbounds { i8 addrspace(1)*, [1 x i64] }, { i8 addrspace(1)*, [1 x i64] } addrspace(1)* %0, i64 0, i32 1, i64 0
  %.unpack10.unpack = load i64, i64 addrspace(1)* %5, align 8
  %6 = bitcast { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1 to float addrspace(1)* addrspace(1)*
  %.unpack.unpack.unpack26 = load float addrspace(1)*, float addrspace(1)* addrspace(1)* %6, align 8
  %.unpack.unpack.unpack19.elt = getelementptr inbounds { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] }, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1, i64 0, i32 0, i64 0, i32 1, i64 0
  %.unpack.unpack.unpack19.unpack = load i64, i64 addrspace(1)* %.unpack.unpack.unpack19.elt, align 8
  %7 = getelementptr inbounds { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] }, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1, i64 0, i32 2, i64 0
  %.unpack16.unpack = load { i64, i64, i8, i8 }, { i64, i64, i8, i8 } addrspace(1)* %7, align 8
  %.fca.2.0.0.extract = extractvalue { i64, i64, i8, i8 } %.unpack16.unpack, 0
  %.fca.2.0.1.extract = extractvalue { i64, i64, i8, i8 } %.unpack16.unpack, 1
  %.fca.2.0.2.extract = extractvalue { i64, i64, i8, i8 } %.unpack16.unpack, 2
  %.fca.2.0.3.extract = extractvalue { i64, i64, i8, i8 } %.unpack16.unpack, 3
  %8 = add i32 %thread_position_in_grid, 1, !dbg !67
  %9 = zext i32 %8 to i64, !dbg !84
  %.not = icmp ne i32 %8, 0, !dbg !95
  %10 = icmp sge i64 %.unpack10.unpack, %9, !dbg !97
  %narrow = select i1 %.not, i1 %10, i1 false, !dbg !97
  br i1 %narrow, label %L20, label %common.ret, !dbg !97

common.ret:                                       ; preds = %L87, %conversion
  ret void, !dbg !98

L20:                                              ; preds = %conversion
  %.elt = getelementptr inbounds [2 x i64], [2 x i64] addrspace(1)* %3, i64 0, i64 0
  %.unpack = load i64, i64 addrspace(1)* %.elt, align 8
  %11 = getelementptr inbounds { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] }, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1, i64 0, i32 1, i64 0
  %.unpack14.unpack = load i64, i64 addrspace(1)* %11, align 8
  %12 = add nsw i64 %9, -1, !dbg !99
  %13 = add i64 %12, %.unpack, !dbg !105
  %14 = call i64 @air.max.s.i64(i64 %.unpack14.unpack, i64 0), !dbg !106
  %15 = add i64 %13, -1, !dbg !134
  %.not5 = icmp ult i64 %15, %14, !dbg !137
  br i1 %.not5, label %L87, label %L84, !dbg !129

L84:                                              ; preds = %L20
  call fastcc void @gpu_report_exception(), !dbg !139
  call fastcc void @gpu_signal_exception(), !dbg !139
  call void @llvm.trap(), !dbg !139
  unreachable, !dbg !139

L87:                                              ; preds = %L20
  %16 = sext i64 %15 to i128, !dbg !143
  %17 = sext i64 %.fca.2.0.1.extract to i128, !dbg !165
  %18 = mul nsw i128 %17, %16, !dbg !168
  %19 = lshr i128 %18, 64, !dbg !170
  %20 = trunc i128 %19 to i64, !dbg !173
  %21 = sext i8 %.fca.2.0.2.extract to i64, !dbg !174
  %22 = mul i64 %15, %21, !dbg !177
  %23 = add i64 %22, %20, !dbg !179
  %24 = call i64 @air.abs.s.i64(i64 %.fca.2.0.0.extract), !dbg !180
  %.not7 = icmp eq i64 %24, 1, !dbg !184
  %25 = mul i64 %.fca.2.0.0.extract, %15, !dbg !186
  %26 = call i8 @air.min.u.i8(i8 %.fca.2.0.3.extract, i8 63), !dbg !187
  %.v = zext i8 %26 to i64, !dbg !187
  %27 = ashr i64 %23, %.v, !dbg !187
  %.lobit = lshr i64 %23, 63, !dbg !189
  %28 = add i64 %27, %.lobit, !dbg !194
  %29 = select i1 %.not7, i64 %25, i64 %28, !dbg !196
  %30 = mul i64 %29, %.fca.2.0.0.extract, !dbg !197
  %31 = sub i64 %15, %30, !dbg !199
  %32 = call i64 @air.max.s.i64(i64 %.unpack.unpack.unpack19.unpack, i64 0), !dbg !200
  %33 = mul i64 %31, %32, !dbg !220
  %34 = add i64 %33, %29, !dbg !225
  %35 = getelementptr inbounds float, float addrspace(1)* %.unpack.unpack.unpack26, i64 %34, !dbg !226
  %36 = load float, float addrspace(1)* %35, align 4, !dbg !226, !tbaa !240
  %37 = getelementptr inbounds float, float addrspace(1)* %.unpack12, i64 %12, !dbg !243
  store float %36, float addrspace(1)* %37, align 4, !dbg !243, !tbaa !240
  br label %common.ret
}

declare i64 @air.max.s.i64(i64, i64)

declare i8 @air.min.u.i8(i8, i8)

attributes #0 = { cold noreturn nounwind }
attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7, !8}
!llvm.dbg.cu = !{!9, !11, !12, !13, !14, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28, !29, !30, !31, !32, !33, !34, !35, !36, !37, !38, !39, !40, !41, !42, !43, !44}
!julia.kernel = !{!45}
!air.kernel = !{!46}
!llvm.ident = !{!55}
!air.version = !{!56}
!air.language_version = !{!57}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 7, !"air.max_device_buffers", i32 31}
!3 = !{i32 7, !"air.max_constant_buffers", i32 31}
!4 = !{i32 7, !"air.max_threadgroup_buffers", i32 31}
!5 = !{i32 7, !"air.max_textures", i32 128}
!6 = !{i32 7, !"air.max_read_write_textures", i32 8}
!7 = !{i32 7, !"air.max_samplers", i32 16}
!8 = !{i32 2, !"SDK Version", [3 x i32] [i32 14, i32 4, i32 1]}
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!10 = !DIFile(filename: "julia", directory: ".")
!11 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!12 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!13 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!14 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!15 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!16 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!17 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!18 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!19 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!20 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!21 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!22 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!23 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!24 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!25 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!26 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!27 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!28 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!29 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!30 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!31 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!32 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!33 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!34 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!35 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!36 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!37 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!38 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!39 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!40 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!41 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!42 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!43 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!44 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
!45 = !{void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)*, [1 x i64] addrspace(1)*, [2 x i64] addrspace(1)*, i32, i32)* @_Z15getindex_kernel16mtlKernelContext14MtlDeviceArrayI7Float32Li1ELi1EE13ReshapedArrayIS1_Li1E7AdjointIS1_S0_IS1_Li2ELi1EEE5TupleI27SignedMultiplicativeInverseI5Int64EEES4_IS6_E9UnitRangeIS6_E}
!46 = !{void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)*, [1 x i64] addrspace(1)*, [2 x i64] addrspace(1)*, i32, i32)* @_Z15getindex_kernel16mtlKernelContext14MtlDeviceArrayI7Float32Li1ELi1EE13ReshapedArrayIS1_Li1E7AdjointIS1_S0_IS1_Li2ELi1EEE5TupleI27SignedMultiplicativeInverseI5Int64EEES4_IS6_E9UnitRangeIS6_E, !47, !48}
!47 = !{}
!48 = !{!49, !50, !51, !52, !53, !54}
!49 = !{i32 0, !"air.buffer", !"air.location_index", i32 0, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 16, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"MtlDeviceVector{Float32, 1}", !"air.arg_name", !"dest"}
!50 = !{i32 1, !"air.buffer", !"air.location_index", i32 1, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 56, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"Base.ReshapedArray{Float32, 1, LinearAlgebra.Adjoint{Float32, MtlDeviceMatrix{Float32, 1}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}", !"air.arg_name", !"src"}
!51 = !{i32 2, !"air.buffer", !"air.location_index", i32 2, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 8, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"Tuple{Int64}", !"air.arg_name", !"idims"}
!52 = !{i32 3, !"air.buffer", !"air.location_index", i32 3, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 16, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"UnitRange{Int64}", !"air.arg_name", !"Is"}
!53 = !{i32 4, !"air.threads_per_grid", !"air.arg_type_name", !"uint"}
!54 = !{i32 5, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint"}
!55 = !{!"Julia 1.10.2 with Metal.jl"}
!56 = !{i32 2, i32 5, i32 0}
!57 = !{!"Metal", i32 3, i32 1, i32 0}
!58 = distinct !DISubprogram(name: "report_exception", linkageName: "julia_report_exception_3328", scope: null, file: !59, line: 13, type: !60, scopeLine: 13, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !20, retainedNodes: !47)
!59 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/device/runtime.jl", directory: ".")
!60 = !DISubroutineType(cc: DW_CC_nocall, types: !47)
!61 = !DILocation(line: 18, scope: !58)
!62 = distinct !DISubprogram(name: "signal_exception", linkageName: "julia_signal_exception_3349", scope: null, file: !59, line: 9, type: !63, scopeLine: 9, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !28, retainedNodes: !47)
!63 = !DISubroutineType(types: !47)
!64 = !DILocation(line: 10, scope: !62)
!65 = distinct !DISubprogram(name: "getindex_kernel", linkageName: "julia_getindex_kernel_4165", scope: null, file: !66, line: 82, type: !63, scopeLine: 82, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!66 = !DIFile(filename: "/Users/kpierce/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl", directory: ".")
!67 = !DILocation(line: 87, scope: !68, inlinedAt: !70)
!68 = distinct !DISubprogram(name: "+;", linkageName: "+", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!69 = !DIFile(filename: "int.jl", directory: ".")
!70 = !DILocation(line: 49, scope: !71, inlinedAt: !73)
!71 = distinct !DISubprogram(name: "#thread_position_in_grid_1d;", linkageName: "#thread_position_in_grid_1d", scope: !72, file: !72, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!72 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/device/intrinsics/arguments.jl", directory: ".")
!73 = !DILocation(line: 36, scope: !74, inlinedAt: !76)
!74 = distinct !DISubprogram(name: "global_index;", linkageName: "global_index", scope: !75, file: !75, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!75 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/gpuarrays.jl", directory: ".")
!76 = !DILocation(line: 44, scope: !77, inlinedAt: !79)
!77 = distinct !DISubprogram(name: "linear_index;", linkageName: "linear_index", scope: !78, file: !78, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!78 = !DIFile(filename: "/Users/kpierce/.julia/packages/GPUArrays/OKkAu/src/device/indexing.jl", directory: ".")
!79 = !DILocation(line: 66, scope: !80, inlinedAt: !81)
!80 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !78, file: !78, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!81 = !DILocation(line: 85, scope: !82, inlinedAt: !83)
!82 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !66, file: !66, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!83 = !DILocation(line: 82, scope: !65)
!84 = !DILocation(line: 708, scope: !85, inlinedAt: !87)
!85 = distinct !DISubprogram(name: "toInt64;", linkageName: "toInt64", scope: !86, file: !86, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!86 = !DIFile(filename: "boot.jl", directory: ".")
!87 = !DILocation(line: 784, scope: !88, inlinedAt: !89)
!88 = distinct !DISubprogram(name: "Int64;", linkageName: "Int64", scope: !86, file: !86, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!89 = !DILocation(line: 7, scope: !90, inlinedAt: !92)
!90 = distinct !DISubprogram(name: "convert;", linkageName: "convert", scope: !91, file: !91, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!91 = !DIFile(filename: "number.jl", directory: ".")
!92 = !DILocation(line: 551, scope: !93, inlinedAt: !94)
!93 = distinct !DISubprogram(name: "rem;", linkageName: "rem", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!94 = !DILocation(line: 1066, scope: !68, inlinedAt: !76)
!95 = !DILocation(line: 514, scope: !96, inlinedAt: !97)
!96 = distinct !DISubprogram(name: "<=;", linkageName: "<=", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!97 = !DILocation(line: 67, scope: !80, inlinedAt: !81)
!98 = !DILocation(line: 0, scope: !82, inlinedAt: !83)
!99 = !DILocation(line: 86, scope: !100, inlinedAt: !101)
!100 = distinct !DISubprogram(name: "-;", linkageName: "-", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!101 = !DILocation(line: 929, scope: !102, inlinedAt: !104)
!102 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !103, file: !103, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!103 = !DIFile(filename: "range.jl", directory: ".")
!104 = !DILocation(line: 87, scope: !82, inlinedAt: !83)
!105 = !DILocation(line: 87, scope: !68, inlinedAt: !101)
!106 = !DILocation(line: 647, scope: !107, inlinedAt: !109)
!107 = distinct !DISubprogram(name: "ifelse;", linkageName: "ifelse", scope: !108, file: !108, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!108 = !DIFile(filename: "essentials.jl", directory: ".")
!109 = !DILocation(line: 532, scope: !110, inlinedAt: !112)
!110 = distinct !DISubprogram(name: "max;", linkageName: "max", scope: !111, file: !111, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!111 = !DIFile(filename: "promotion.jl", directory: ".")
!112 = !DILocation(line: 454, scope: !113, inlinedAt: !114)
!113 = distinct !DISubprogram(name: "OneTo;", linkageName: "OneTo", scope: !103, file: !103, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!114 = !DILocation(line: 467, scope: !113, inlinedAt: !115)
!115 = !DILocation(line: 469, scope: !116, inlinedAt: !117)
!116 = distinct !DISubprogram(name: "oneto;", linkageName: "oneto", scope: !103, file: !103, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!117 = !DILocation(line: 291, scope: !118, inlinedAt: !120)
!118 = distinct !DISubprogram(name: "map;", linkageName: "map", scope: !119, file: !119, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!119 = !DIFile(filename: "tuple.jl", directory: ".")
!120 = !DILocation(line: 98, scope: !121, inlinedAt: !123)
!121 = distinct !DISubprogram(name: "axes;", linkageName: "axes", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!122 = !DIFile(filename: "abstractarray.jl", directory: ".")
!123 = !DILocation(line: 137, scope: !124, inlinedAt: !125)
!124 = distinct !DISubprogram(name: "axes1;", linkageName: "axes1", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!125 = !DILocation(line: 389, scope: !126, inlinedAt: !127)
!126 = distinct !DISubprogram(name: "eachindex;", linkageName: "eachindex", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!127 = !DILocation(line: 687, scope: !128, inlinedAt: !129)
!128 = distinct !DISubprogram(name: "checkbounds;", linkageName: "checkbounds", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!129 = !DILocation(line: 702, scope: !128, inlinedAt: !130)
!130 = !DILocation(line: 248, scope: !131, inlinedAt: !133)
!131 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !132, file: !132, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!132 = !DIFile(filename: "reshapedarray.jl", directory: ".")
!133 = !DILocation(line: 88, scope: !82, inlinedAt: !83)
!134 = !DILocation(line: 86, scope: !100, inlinedAt: !135)
!135 = !DILocation(line: 763, scope: !136, inlinedAt: !127)
!136 = distinct !DISubprogram(name: "checkindex;", linkageName: "checkindex", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!137 = !DILocation(line: 513, scope: !138, inlinedAt: !135)
!138 = distinct !DISubprogram(name: "<;", linkageName: "<", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!139 = !DILocation(line: 4, scope: !140, inlinedAt: !142)
!140 = distinct !DISubprogram(name: "#throw_boundserror", linkageName: "julia_#throw_boundserror_4181", scope: null, file: !141, line: 33, type: !60, scopeLine: 33, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !14, retainedNodes: !47)
!141 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/device/quirks.jl", directory: ".")
!142 = distinct !DILocation(line: 702, scope: !128, inlinedAt: !130)
!143 = !DILocation(line: 715, scope: !144, inlinedAt: !145)
!144 = distinct !DISubprogram(name: "toInt128;", linkageName: "toInt128", scope: !86, file: !86, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!145 = !DILocation(line: 785, scope: !146, inlinedAt: !147)
!146 = distinct !DISubprogram(name: "Int128;", linkageName: "Int128", scope: !86, file: !86, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!147 = !DILocation(line: 7, scope: !90, inlinedAt: !148)
!148 = !DILocation(line: 891, scope: !149, inlinedAt: !151)
!149 = distinct !DISubprogram(name: "widen;", linkageName: "widen", scope: !150, file: !150, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!150 = !DIFile(filename: "operators.jl", directory: ".")
!151 = !DILocation(line: 139, scope: !152, inlinedAt: !154)
!152 = distinct !DISubprogram(name: "_mul_high;", linkageName: "_mul_high", scope: !153, file: !153, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!153 = !DIFile(filename: "multinverses.jl", directory: ".")
!154 = !DILocation(line: 158, scope: !155, inlinedAt: !156)
!155 = distinct !DISubprogram(name: "div;", linkageName: "div", scope: !153, file: !153, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!156 = !DILocation(line: 172, scope: !157, inlinedAt: !158)
!157 = distinct !DISubprogram(name: "divrem;", linkageName: "divrem", scope: !153, file: !153, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!158 = !DILocation(line: 223, scope: !159, inlinedAt: !160)
!159 = distinct !DISubprogram(name: "_ind2sub_rs;", linkageName: "_ind2sub_rs", scope: !132, file: !132, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!160 = !DILocation(line: 220, scope: !161, inlinedAt: !162)
!161 = distinct !DISubprogram(name: "ind2sub_rs;", linkageName: "ind2sub_rs", scope: !132, file: !132, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!162 = !DILocation(line: 260, scope: !163, inlinedAt: !164)
!163 = distinct !DISubprogram(name: "_unsafe_getindex;", linkageName: "_unsafe_getindex", scope: !132, file: !132, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!164 = !DILocation(line: 249, scope: !131, inlinedAt: !133)
!165 = !DILocation(line: 549, scope: !93, inlinedAt: !166)
!166 = !DILocation(line: 1066, scope: !167, inlinedAt: !151)
!167 = distinct !DISubprogram(name: "*;", linkageName: "*", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!168 = !DILocation(line: 1053, scope: !167, inlinedAt: !169)
!169 = !DILocation(line: 1068, scope: !167, inlinedAt: !151)
!170 = !DILocation(line: 530, scope: !171, inlinedAt: !172)
!171 = distinct !DISubprogram(name: ">>>;", linkageName: ">>>", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!172 = !DILocation(line: 538, scope: !171, inlinedAt: !151)
!173 = !DILocation(line: 544, scope: !93, inlinedAt: !151)
!174 = !DILocation(line: 549, scope: !93, inlinedAt: !175)
!175 = !DILocation(line: 1066, scope: !167, inlinedAt: !176)
!176 = !DILocation(line: 159, scope: !155, inlinedAt: !156)
!177 = !DILocation(line: 88, scope: !167, inlinedAt: !178)
!178 = !DILocation(line: 1068, scope: !167, inlinedAt: !176)
!179 = !DILocation(line: 87, scope: !68, inlinedAt: !176)
!180 = !DILocation(line: 302, scope: !181, inlinedAt: !183)
!181 = distinct !DISubprogram(name: "#abs;", linkageName: "#abs", scope: !182, file: !182, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!182 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/device/intrinsics/math.jl", directory: ".")
!183 = !DILocation(line: 160, scope: !155, inlinedAt: !156)
!184 = !DILocation(line: 521, scope: !185, inlinedAt: !183)
!185 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !111, file: !111, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!186 = !DILocation(line: 88, scope: !167, inlinedAt: !183)
!187 = !DILocation(line: 527, scope: !188, inlinedAt: !183)
!188 = distinct !DISubprogram(name: ">>;", linkageName: ">>", scope: !69, file: !69, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!189 = !DILocation(line: 711, scope: !85, inlinedAt: !190)
!190 = !DILocation(line: 784, scope: !88, inlinedAt: !191)
!191 = !DILocation(line: 7, scope: !90, inlinedAt: !192)
!192 = !DILocation(line: 546, scope: !93, inlinedAt: !193)
!193 = !DILocation(line: 1066, scope: !68, inlinedAt: !183)
!194 = !DILocation(line: 87, scope: !68, inlinedAt: !195)
!195 = !DILocation(line: 1068, scope: !68, inlinedAt: !183)
!196 = !DILocation(line: 647, scope: !107, inlinedAt: !183)
!197 = !DILocation(line: 88, scope: !167, inlinedAt: !198)
!198 = !DILocation(line: 173, scope: !157, inlinedAt: !158)
!199 = !DILocation(line: 86, scope: !100, inlinedAt: !198)
!200 = !DILocation(line: 647, scope: !107, inlinedAt: !201)
!201 = !DILocation(line: 532, scope: !110, inlinedAt: !202)
!202 = !DILocation(line: 454, scope: !113, inlinedAt: !203)
!203 = !DILocation(line: 467, scope: !113, inlinedAt: !204)
!204 = !DILocation(line: 469, scope: !116, inlinedAt: !205)
!205 = !DILocation(line: 292, scope: !118, inlinedAt: !206)
!206 = !DILocation(line: 98, scope: !121, inlinedAt: !207)
!207 = !DILocation(line: 2957, scope: !208, inlinedAt: !209)
!208 = distinct !DISubprogram(name: "_sub2ind;", linkageName: "_sub2ind", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!209 = !DILocation(line: 1330, scope: !210, inlinedAt: !211)
!210 = distinct !DISubprogram(name: "_to_linear_index;", linkageName: "_to_linear_index", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!211 = !DILocation(line: 114, scope: !212, inlinedAt: !214)
!212 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !213, file: !213, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!213 = !DIFile(filename: "/Users/kpierce/.julia/packages/Metal/q9oGt/src/device/array.jl", directory: ".")
!214 = !DILocation(line: 329, scope: !215, inlinedAt: !217)
!215 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !216, file: !216, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!216 = !DIFile(filename: "/Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/adjtrans.jl", directory: ".")
!217 = !DILocation(line: 264, scope: !218, inlinedAt: !219)
!218 = distinct !DISubprogram(name: "_unsafe_getindex_rs;", linkageName: "_unsafe_getindex_rs", scope: !132, file: !132, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!219 = !DILocation(line: 261, scope: !163, inlinedAt: !164)
!220 = !DILocation(line: 88, scope: !167, inlinedAt: !221)
!221 = !DILocation(line: 2989, scope: !222, inlinedAt: !223)
!222 = distinct !DISubprogram(name: "_sub2ind_recurse;", linkageName: "_sub2ind_recurse", scope: !122, file: !122, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!223 = !DILocation(line: 2989, scope: !222, inlinedAt: !224)
!224 = !DILocation(line: 2973, scope: !208, inlinedAt: !207)
!225 = !DILocation(line: 86, scope: !100, inlinedAt: !226)
!226 = !DILocation(line: 38, scope: !227, inlinedAt: !229)
!227 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !228, file: !228, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!228 = !DIFile(filename: "/Users/kpierce/.julia/packages/LLVM/bzSzE/src/interop/base.jl", directory: ".")
!229 = !DILocation(line: 0, scope: !230, inlinedAt: !232)
!230 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !231, file: !231, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!231 = !DIFile(filename: "none", directory: ".")
!232 = !DILocation(line: 0, scope: !233, inlinedAt: !234)
!233 = distinct !DISubprogram(name: "pointerref;", linkageName: "pointerref", scope: !231, file: !231, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!234 = !DILocation(line: 85, scope: !235, inlinedAt: !237)
!235 = distinct !DISubprogram(name: "unsafe_load;", linkageName: "unsafe_load", scope: !236, file: !236, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!236 = !DIFile(filename: "/Users/kpierce/.julia/packages/LLVM/bzSzE/src/interop/pointer.jl", directory: ".")
!237 = !DILocation(line: 82, scope: !238, inlinedAt: !239)
!238 = distinct !DISubprogram(name: "arrayref;", linkageName: "arrayref", scope: !213, file: !213, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!239 = !DILocation(line: 103, scope: !212, inlinedAt: !211)
!240 = !{!241, !241, i64 0, i64 0}
!241 = !{!"custom_tbaa_addrspace(1)", !242, i64 0}
!242 = !{!"custom_tbaa"}
!243 = !DILocation(line: 38, scope: !227, inlinedAt: !244)
!244 = !DILocation(line: 0, scope: !230, inlinedAt: !245)
!245 = !DILocation(line: 0, scope: !246, inlinedAt: !247)
!246 = distinct !DISubprogram(name: "pointerset;", linkageName: "pointerset", scope: !231, file: !231, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!247 = !DILocation(line: 88, scope: !248, inlinedAt: !249)
!248 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !236, file: !236, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!249 = !DILocation(line: 88, scope: !250, inlinedAt: !251)
!250 = distinct !DISubprogram(name: "arrayset;", linkageName: "arrayset", scope: !213, file: !213, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!251 = !DILocation(line: 105, scope: !252, inlinedAt: !253)
!252 = distinct !DISubprogram(name: "setindex!;", linkageName: "setindex!", scope: !213, file: !213, type: !63, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !12, retainedNodes: !47)
!253 = !DILocation(line: 89, scope: !82, inlinedAt: !83)

@maleadt
Copy link
Member

maleadt commented Apr 15, 2024

Compiler error:

unable to legalize instruction: %248:_(s64) = 147 %241:_, %243:_
Context:
%248:_(s64) = 147 %241:_, %243:_
%241:_(s64), %242:_(s64) = 74 %49:_(s128)
%243:_(s64), %244:_(s64) = 74 %50:_(s128)
%49:_(s128) = 124 %26:_(s64)
%50:_(s128) = 91 %41:_(p1) :: (load (s64) from %ir.19 + 8, addrspace 1)
%26:_(s64) = 45 %25:_, %18:_
%41:_(p1) = 81 %105:_(s64)
%25:_(s64) = nsw 46 %8:_, %93:_
%18:_(s64) = 90 %16:_(p1) :: (load (s64) from %ir..elt3, addrspace 1)
%105:_(s64) = 45 %94:_, %104:_
%8:_(s64) = 126 %7:gpr32(s32)
%93:_(s64) = 120 i64 2
%16:_(p1) = 90 %17:_(p64) :: (dereferenceable load (p1) from @agc.buffer_pointers.3, addrspace 64)
%94:_(s64) = 80 %28:_(p1)
%104:_(s64) = 120 i64 40
%7:gpr32(s32) = 45 %0:_, %6:_
%17:_(p64) = 71 @agc.buffer_pointers.3
%28:_(p1) = 90 %15:_(p64) :: (dereferenceable load (p1) from @agc.buffer_pointers.1, addrspace 64)
%0:_(s32) = 116 intrinsic(@llvm.agx2.thread.position.in.grid.x)

@maleadt
Copy link
Member

maleadt commented Apr 15, 2024

Reduced:

define void @my_kernel({ i8 addrspace(1)*, [1 x i64] } addrspace(1)* %0, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)* %1, [1 x i64] addrspace(1)* %2, [2 x i64] addrspace(1)* %3, i32 %a, i32 %thread_position_in_grid) {
b:
  %.c.d = load { i64, i64, i8, i8 }, { i64, i64, i8, i8 } addrspace(1)* null, align 4
  %.e.2.0.1.extract = extractvalue { i64, i64, i8, i8 } %.c.d, 1
  %4 = sext i64 %.e.2.0.1.extract to i128
  %5 = mul i128 %4, -2
  %6 = lshr i128 %5, 1
  %7 = trunc i128 %6 to i64
  %8 = getelementptr float, float addrspace(1)* null, i64 %7
  %9 = load float, float addrspace(1)* %8, align 4
  store float %9, float addrspace(1)* null, align 4
  ret void
}

!air.kernel = !{!0}
!air.version = !{!8}

!0 = !{void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*, { [1 x { i8 addrspace(1)*, [2 x i64] }], [1 x i64], [1 x { i64, i64, i8, i8 }] } addrspace(1)*, [1 x i64] addrspace(1)*, [2 x i64] addrspace(1)*, i32, i32)* @my_kernel, !1, !2}
!1 = !{}
!2 = !{!3, !3, !4, !5, !6, !7}
!3 = !{i32 1, !""}
!4 = !{i32 2, !""}
!5 = !{i32 3, !""}
!6 = !{i32 4, !""}
!7 = !{i32 5, !""}
!8 = !{i32 2, i32 5, i32 0}

This gives the same crash, I think:

unable to legalize instruction: %53:_(s64) = 147 %46:_, %48:_
Context:
%53:_(s64) = 147 %46:_, %48:_
%46:_(s64), %47:_(s64) = 74 %3:_(s128)
%48:_(s64), %49:_(s64) = 74 %4:_(s128)
%3:_(s128) = 91 %1:_(p1) :: (load (s64) from `i64 addrspace(1)* inttoptr (i64 8 to i64 addrspace(1)*)`, addrspace 1)
%4:_(s128) = 120 i128 36893488147419103230
%1:_(p1) = 81 %2:_(s64)
%2:_(s64) = 120 i64 8
(in function: agc.main.constant_program)

@christiangnrd
Copy link
Contributor

christiangnrd commented Apr 15, 2024

Bisected to JuliaGPU/GPUArrays.jl#512

@christiangnrd christiangnrd added the bug Something isn't working label Apr 15, 2024
@tgymnich
Copy link
Member

JuliaGPU/GPUCompiler.jl#571

This should at least yield nicer error messages

@maleadt
Copy link
Member

maleadt commented Apr 17, 2024

With the above:

Reason: unsupported use of i128 value
Stacktrace:
  [1] toInt128
    @ ./boot.jl:715
  [2] Int128
    @ ./boot.jl:785
  [3] convert
    @ ./number.jl:7
  [4] widen
    @ ./operators.jl:891
  [5] _mul_high
    @ ./multinverses.jl:139
  [6] div
    @ ./multinverses.jl:158
  [7] divrem
    @ ./multinverses.jl:172
  [8] _ind2sub_rs
    @ ./reshapedarray.jl:223
  [9] ind2sub_rs
    @ ./reshapedarray.jl:220
 [10] _unsafe_getindex
    @ ./reshapedarray.jl:260
 [11] getindex
    @ ./reshapedarray.jl:249
 [12] macro expansion
    @ ~/Julia/pkg/GPUArrays/src/host/indexing.jl:88
 [13] getindex_kernel
    @ ~/Julia/pkg/GPUArrays/src/host/indexing.jl:82

@maleadt maleadt changed the title Bug when calling copy with a view of MtlArray ReshapedArray indexing broken because of Int128 operation Apr 17, 2024
@maleadt
Copy link
Member

maleadt commented Apr 17, 2024

So the problem is that normally operations like view and reshape preserve the MtlArray, however here the reshape of an Adjoint results in an actual ReshapedArray. Indexing on that array wrapper is implemented (in Base) using Int128, which is already visible in the type signature:

SubArray{Float32, 1, Base.ReshapedArray{Float32, 1, LinearAlgebra.Adjoint{Float32, MtlMatrix{Float32, Private}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, Tuple{UnitRange{Int64}}, false}

LLVM normally supports legalizing such operations, but that only happens during ISel, and Apple's implementation doesn't seem to allow that. And legalizing i128 to i64 in IR seems tricky.

@timholy You originally added the ReshapedArray type; is there a way to opt out of the use of Int128, which I presume comes from the SignedMultiplicativeInverse{Int64} indices? Alternatively, I guess we could overlay ind2sub_rs, but that feels like a hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants