Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC hangs with "parallel mark" if using GC_allow_register_threads() #619

Open
arshajii opened this issue Feb 6, 2024 · 11 comments
Open

GC hangs with "parallel mark" if using GC_allow_register_threads() #619

arshajii opened this issue Feb 6, 2024 · 11 comments

Comments

@arshajii
Copy link

arshajii commented Feb 6, 2024

Current master f7e513a. Here is the test program:

#define GC_THREADS
#include <gc.h>

typedef struct list {
    long data1;
    long data2;
    long data3;
    struct list *next;
} list_t;

int main(void) {
    GC_INIT();
    GC_allow_register_threads();

    list_t *curr = NULL;
    long data = 0;

    for (int i = 0; i < 100000000; i++) {
        list_t *node = GC_MALLOC(sizeof(*node));
        node->data1 = data + 1;
        node->data2 = data + 2;
        node->data3 = data + 3;
        ++data;
        if (curr)
            curr->next = node;
        curr = node;
    }
}

Built with cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_large_config=ON -Denable_thread_local_alloc=ON -Denable_handle_fork=ON on a M1 Max MacBook Pro.

The program hangs here:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010001a9ac a.out`GC_wait_marker + 28
    frame #3: 0x0000000100011010 a.out`GC_mark_local + 872
    frame #4: 0x000000010000fc94 a.out`GC_mark_some + 908
    frame #5: 0x0000000100006170 a.out`GC_stopped_mark + 660
    frame #6: 0x0000000100005de0 a.out`GC_try_to_collect_inner + 344
    frame #7: 0x0000000100007660 a.out`GC_collect_or_expand + 228
    frame #8: 0x0000000100007910 a.out`GC_allocobj + 160
    frame #9: 0x000000010000df8c a.out`GC_generic_malloc_inner_small + 200
    frame #10: 0x000000010000effc a.out`GC_generic_malloc_many + 1152
    frame #11: 0x000000010001b528 a.out`GC_malloc_kind + 196
    frame #12: 0x000000010001bd68 a.out`main + 84
    frame #13: 0x00000001a08bff28 dyld`start + 2236

If you comment GC_allow_register_threads() or build with -Denable_parallel_mark=OFF, it completes normally in about a second.

@ivmai
Copy link
Owner

ivmai commented Feb 27, 2024

Is this reproducible with the following? On the latest master.
cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_gc_assertions=ON -Denable_thread_local_alloc=OFF -Denable_handle_fork=OFF -DCFLAGS_EXTRA="-D MAX_MARKERS=4 -D GC_DISABLE_INCREMENTAL"

If it is, then please provide back trace for all threads.

@arshajii
Copy link
Author

Hi @ivmai -- with your cmake command it completes in about 80 seconds. Actually, I realized even my original build will also complete if given enough time, it just takes a very long time (~3min). This is latest master 7fc3e07 -- the -DCFLAGS_EXTRA arg was not working for me so I instead added target_compile_definitions(gc PUBLIC -DMAX_MARKERS=4 -DGC_DISABLE_INCREMENTAL) to CMakeLists.txt. In case it is still helpful, here are a few backtraces using your cmake command:

1

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1155:21
    frame #4: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
    frame #5: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
    frame #6: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
    frame #7: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
    frame #8: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
    frame #9: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
    frame #10: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
    frame #11: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
    frame #12: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
    frame #13: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
    frame #14: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
    frame #15: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
    frame #16: 0x00000001000314f8 a.out`main + 84
    frame #17: 0x00000001a08bff28 dyld`start + 2236
  thread #2, name = 'GC-marker-0'
    frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x000000016fe76f68, mark_stack=0x000000016fe76f78, mark_stack_limit=0x000000016fe86f78) at mark.c:862:39
    frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016fe76f78, local_top=0x000000016fe76f78) at mark.c:1063:25
    frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=1) at mark.c:1194:9
    frame #3: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
    frame #4: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
    frame #5: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #3, name = 'GC-marker-1'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=2) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #4, name = 'GC-marker-2'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148

2

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x0000000100148000, mark_stack=0x0000000100148000, mark_stack_limit=0x0000000100158000) at mark.c:862:39
    frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x0000000100148000, local_top=0x0000000100148010) at mark.c:1063:25
    frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1194:9
    frame #3: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
    frame #4: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
    frame #5: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
    frame #6: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
    frame #7: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
    frame #8: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
    frame #9: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
    frame #10: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
    frame #11: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
    frame #12: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
    frame #13: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
    frame #14: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
    frame #15: 0x00000001000314f8 a.out`main + 84
    frame #16: 0x00000001a08bff28 dyld`start + 2236
  thread #2, name = 'GC-marker-0'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=2) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #3, name = 'GC-marker-1'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=1) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #4, name = 'GC-marker-2'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148

3

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1155:21
    frame #4: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
    frame #5: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
    frame #6: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
    frame #7: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
    frame #8: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
    frame #9: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
    frame #10: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
    frame #11: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
    frame #12: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
    frame #13: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
    frame #14: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
    frame #15: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
    frame #16: 0x00000001000314f8 a.out`main + 84
    frame #17: 0x00000001a08bff28 dyld`start + 2236
  thread #2, name = 'GC-marker-0'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=2) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #3, name = 'GC-marker-1'
    frame #0: 0x0000000100013988 a.out`GC_find_header(h="\x96\r\v") at headers.c:37:16
    frame #1: 0x000000010001c2bc a.out`GC_push_contents_hdr(current="\x96\r\v", mark_stack_top=0x000000016ff02f78, mark_stack_limit=0x000000016ff12f78, source="\xe0=\xb5\U00000002\U00000001", hhdr=0x000000010321abf0, do_offset_check=1) at gc_pmark.h:302:7
    frame #2: 0x000000010001b7dc a.out`GC_mark_from(mark_stack_top=0x000000016ff02f78, mark_stack=0x000000016ff02f78, mark_stack_limit=0x000000016ff12f78) at mark.c:896:9
    frame #3: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016ff02f78, local_top=0x000000016ff02f88) at mark.c:1063:25
    frame #4: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=1) at mark.c:1194:9
    frame #5: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
    frame #6: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
    frame #7: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
  thread #4, name = 'GC-marker-2'
    frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
    frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
    frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
    frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
    frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148

@ivmai
Copy link
Owner

ivmai commented Feb 28, 2024

Hmm, the stack traces above do not show any issue - e.g. in case 1: thread 2 is working, other threads just wait thread 2 to complete its part of work. Is there a hang inside GC_mark_from (of thread 2)? If yes, where?

@ivmai
Copy link
Owner

ivmai commented Feb 28, 2024

the -DCFLAGS_EXTRA arg was not working for me so I instead added target_compile_definitions

Yes, my mistake. Passing quotes is not straightforward sometimes. Should be:
cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_gc_assertions=ON -Denable_thread_local_alloc=OFF -Denable_handle_fork=OFF "-DCFLAGS_EXTRA=-DMAX_MARKERS=4 -DGC_DISABLE_INCREMENTAL"

@ivmai
Copy link
Owner

ivmai commented Feb 28, 2024

I can't reproduce it on Mac-mini (x86_64).
I changed 100000000 to 10000000 (10 times smaller) in the above test code (t.c).
gcc -O3 -I ../include t.c libgc.a
With GC_allow_register_threads():

time ./a.out
./a.out  3.53s user 0.05s system 97% cpu 3.677 total

Without GC_allow_register_threads:

time ./a.out
./a.out  11.78s user 0.10s system 99% cpu 11.983 total

So, I don't see any hang. On the opposite I see that parallel marker brings a speedup.

@inumanag
Copy link

inumanag commented Mar 1, 2024

I observe the same issue as described in @arshajii post. macOS Sonoma 14.2.1, Apple clang version 15.0.0 (clang-1500.1.0.2.5), M1 Pro.

@ivmai
Copy link
Owner

ivmai commented Mar 2, 2024

I see. Could you please investigate what is going on in GC_mark_from

 thread #2, name = 'GC-marker-0'
    frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x000000016fe76f68, mark_stack=0x000000016fe76f78, mark_stack_limit=0x000000016fe86f78) at mark.c:862:39
    frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016fe76f78, local_top=0x000000016fe76f78) at mark.c:1063:25
    frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=1) at mark.c:1194:9

Is some loop in GC_mark_from running forever, or is GC_mark_from called with same arguments unlimited amount of time, or something else?

@arshajii
Copy link
Author

arshajii commented Mar 7, 2024

I took a look at this -- GC_mark_from is taking the bulk of the time after running the program through a profiler, both in the main thread and in all the marker threads.

If I add some simple print statements in GC_mark_from at entry and exit, I see it's always being called with the same arguments:

GC_mark_from 0x100898010 0x100898000 0x1008a8000  (three arguments of GC_mark_from)
GC_mark_from done
GC_mark_from 0x100898010 0x100898000 0x1008a8000
GC_mark_from done
GC_mark_from 0x100898010 0x100898000 0x1008a8000
GC_mark_from done
...

Not sure if this is expected. Let me know if there's anything else that would be helpful.

@arshajii
Copy link
Author

arshajii commented Apr 2, 2024

Just following up on this. Once this is sorted we'll be able to upgrade the GC in Codon. Let me know if there's anything else we can do to help track down the issue.

@ivmai
Copy link
Owner

ivmai commented Apr 7, 2024

Please fetch the latest master (I've added some assertion to detect an infinite loop in alloc.c, not sure if it relates to this issue). Also please check if this is reproducible with -DMAX_MARKERS=2 (instead of 4).

If I add some simple print statements in GC_mark_from at entry and exit, I see it's always being called with the same arguments

Also print GC_gc_no along with the GC_mark_from arguments.

And I don't understand whether you observe a hang or a too long execution of the given test code?

@ivmai
Copy link
Owner

ivmai commented Apr 7, 2024

It would be, of course, much better if you dig into the logic and figure out the root cause of this issue. (I don't think I will have time to do it myself within April or May.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants