Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-timestep_embedding _sometimes_ fails with ptrace: Operation not permitted. [opencl-clover, gfx1103] (with gdb backtrace.) #772

Open
dreirund opened this issue Mar 27, 2024 · 0 comments

Comments

@dreirund
Copy link

Ahoj,

I just try to build this out of couriosity (I am not a coder, just a system admin).

Issue:

On my machine, test-timestep_embedding sometimes fails with
ptrace: Operation not permitted.

The output of the test when it fails:

ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon Graphics (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.57, 6.8.1-1-longcmdline-cpuopt-custom-clang)'
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 0.0000 0.0000 0.4242 0.9880 -0.1558 0.8946 0.9923 0.9994 1.0000 -0.9056 0.1547 0.9878 0.4470 0.1240 0.0333 0.0089 0.0000 0.0000 
-----------------------------------
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 -1912022133400141824.0000 GGML_ASSERT: /tmp/makepkg/build/ggml-git/src/ggml/tests/test-timestep_embedding.cpp:170: equalsf(output[i], expected_result[i])
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted

(Exitcode: 134)

The output of the test when it succeeds:

ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon Graphics (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.57, 6.8.1-1-longcmdline-cpuopt-custom-clang)'
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 0.0000 0.0000 0.4242 0.9880 -0.1558 0.8946 0.9923 0.9994 1.0000 -0.9056 0.1547 0.9878 0.4470 0.1240 0.0333 0.0089 0.0000 0.0000 
-----------------------------------
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 -0.0000 0.0000 0.4242 0.9880 -0.1558 0.8946 0.9923 0.9994 1.0000 -0.9056 0.1547 0.9878 0.4470 0.1240 0.0333 0.0089 -0.0000 0.0000 

System and build information:

I am on Artix GNU/Linux, I have a AMD 7840U SoC (Radeon graphics: gfx1103), and I build with clblast and openblas but not any other accelerator as follows:

cmake -S "${_pkgbase}" -B 'build' \
  -DCMAKE_INSTALL_PREFIX='/usr' \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS='ON' \
  -DGGML_ALL_WARNINGS='OFF' \
  -DGGML_ALL_WARNINGS_3RD_PARTY='OFF' \
  -DGGML_AVX=ON \
  -DGGML_AVX2=ON \
  -DGGML_AVX512=ON \
  -DGGML_AVX512_VBMI=ON \
  -DGGML_AVX512_VNNI=ON \
  -DGGML_BUILD_EXAMPLES='ON' \
  -DGGML_BUILD_TESTS='ON' \
  -DGGML_CLBLAST='ON' \
  -DGGML_CUBLAS='OFF' \
  -DGGML_F16C='ON' \
  -DGGML_FMA='ON' \
  -DGGML_HIPBLAS='OFF' \
  -DGGML_METAL='OFF' \
  -DGGML_NO_ACCELERATE='OFF' \
  -DGGML_OPENBLAS='ON'

make

Output of the cmake run:

-- The C compiler identification is GNU 13.2.1
-- The CXX compiler identification is GNU 13.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.44.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
GNU ld (GNU Binutils) 2.42.0
-- x86 detected
-- Linux detected
-- OpenBLAS found
-- clBLAST found
-- x86 detected
-- Linux detected
-- OpenBLAS found
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/makepkg/build/ggml-git/src/build

/proc/cpuinfo for my 1st core:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 116
model name	: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
stepping	: 1
microcode	: 0xa704104
cpu MHz		: 1099.714
cache size	: 1024 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
bogomips	: 6590.38
TLB size	: 3584 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] [15]

Debug information:

Rebuild with -DCMAKE_BUILD_TYPE=Debug, a gdb run of the test when it fails:

Reading symbols from ../bin/test-timestep_embedding...
(gdb) run
Starting program: /tmp/makepkg/build/ggml-git/src/build/bin/test-timestep_embedding 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fff634006c0 (LWP 10563)]
[New Thread 0x7fff62a006c0 (LWP 10564)]
[New Thread 0x7fff620006c0 (LWP 10565)]
[New Thread 0x7fff616006c0 (LWP 10566)]
[New Thread 0x7fff5be006c0 (LWP 10567)]
[New Thread 0x7fff5b4006c0 (LWP 10568)]
[New Thread 0x7fff58e006c0 (LWP 10577)]
[New Thread 0x7fff4fe006c0 (LWP 10578)]
[Thread 0x7fff4fe006c0 (LWP 10578) exited]
[Detaching after fork from child process 10581]
[Detaching after fork from child process 10582]
[Detaching after fork from child process 10583]
[Detaching after fork from child process 10584]
[Detaching after fork from child process 10585]
ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon Graphics (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.57, 6.8.1-1-longcmdline-cpuopt-custom-clang)'
[New Thread 0x7fff4f4006c0 (LWP 10586)]
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 0.0000 0.0000 0.4242 0.9880 -0.1558 0.8946 0.9923 0.9994 1.0000 -0.9056 0.1547 0.9878 0.4470 0.1240 0.0333 0.0089 0.0000 0.0000 
-----------------------------------
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
[New Thread 0x7fff4d8006c0 (LWP 10587)]
[New Thread 0x7fff4ce006c0 (LWP 10588)]
[New Thread 0x7fff43e006c0 (LWP 10589)]
[Thread 0x7fff43e006c0 (LWP 10589) exited]
0.8439 -0.9970 0.6497 0.9733 0.9981 0.9999 1.0000 -0.5366 -0.0776 0.7602 0.2296 0.0621 0.0167 0.0045 25434192347136.0000 GGML_ASSERT: /tmp/makepkg/build/ggml-git/src/ggml/tests/test-timestep_embedding.cpp:170: equalsf(output[i], expected_result[i])
[Thread 0x7fff4ce006c0 (LWP 10588) exited]
[Thread 0x7fff4d8006c0 (LWP 10587) exited]
[Detaching after fork from child process 10590]
warning: process 10438 is already traced by process 10405
ptrace: Operation not permitted.
No stack.
The program is not being run.

Thread 1 "test-timestep_e" received signal SIGABRT, Aborted.
0x00007ffff52ab32c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff52ab32c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff525a6c8 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff52424b8 in abort () from /usr/lib/libc.so.6
#3  0x0000555555556f0f in main (argc=1, argv=0x7fffffffd808) at /tmp/makepkg/build/ggml-git/src/ggml/tests/test-timestep_embedding.cpp:170
(gdb) cont
Continuing.
Couldn't get registers: No such process.
(gdb) [Thread 0x7fff4f4006c0 (LWP 10586) exited]
[Thread 0x7fff58e006c0 (LWP 10577) exited]
[Thread 0x7fff5b4006c0 (LWP 10568) exited]
[Thread 0x7fff5be006c0 (LWP 10567) exited]
[Thread 0x7fff616006c0 (LWP 10566) exited]
[Thread 0x7fff620006c0 (LWP 10565) exited]
[Thread 0x7fff62a006c0 (LWP 10564) exited]
[Thread 0x7fff634006c0 (LWP 10563) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

Regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant