Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Leak memory on exit #1086

Open
zcbenz opened this issue May 7, 2024 · 6 comments · May be fixed by #1142
Open

[Feature] Leak memory on exit #1086

zcbenz opened this issue May 7, 2024 · 6 comments · May be fixed by #1142
Labels
enhancement New feature or request

Comments

@zcbenz
Copy link
Contributor

zcbenz commented May 7, 2024

Memory allocator is declared static and will free memory on exit when process quit gracefully:

MetalAllocator& allocator() {
static MetalAllocator allocator_;
return allocator_;
}

Which can take more than 30 seconds after inferencing a LLM:

Call graph:
    16392 Thread_6280670   DispatchQueue_1: com.apple.main-thread  (serial)
    + 16392 start  (in dyld) + 2436  [0x18da7512c]
    +   16392 dyld4::LibSystemHelpers::exit(int) const  (in libdyld.dylib) + 20  [0x18de0eb80]
    +     16392 exit  (in libsystem_c.dylib) + 44  [0x18dcb1d20]
    +       16392 __cxa_finalize_ranges  (in libsystem_c.dylib) + 476  [0x18dcb1f98]
    +         16392 mlx::core::metal::MetalAllocator::~MetalAllocator()  (in mlx.node) + 48  [0x11bee2908]
    +           16354 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 112  [0x11bee1c2c]
    +           ! 16318 -[AGXG15XFamilyBuffer dealloc]  (in AGXMetalG15X_B0) + 76  [0x1f9bb6b98]
    +           ! : 16317 -[AGXBuffer dealloc]  (in AGXMetalG15X_B0) + 44  [0x1f9b8d4e4]
    +           ! : | 16272 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 320  [0x1ac73976c]
    +           ! : | + 16076 -[IOGPUMetalResource dealloc]  (in IOGPU) + 248  [0x1ac748128]
    +           ! : | + ! 16032 _CFRelease  (in CoreFoundation) + 292  [0x18dfa5b90]
    +           ! : | + ! : 16023 ioGPUResourceFinalize  (in IOGPU) + 104  [0x1ac74e6a0]
    +           ! : | + ! : | 16023 iokit_user_client_trap  (in IOKit) + 8  [0x19155626c]
    +           ! : | + ! : 9 ioGPUResourceFinalize  (in IOGPU) + 140  [0x1ac74e6c4]
    +           ! : | + ! :   9 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! :     9 CF_IS_OBJC  (in CoreFoundation) + 256  [0x18dfa5520]
    +           ! : | + ! 27 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! : 27 CF_IS_OBJC  (in CoreFoundation) + 256,196  [0x18dfa5520,0x18dfa54e4]
    +           ! : | + ! 11 _CFRelease  (in CoreFoundation) + 1216  [0x18dfa5f2c]
    +           ! : | + ! : 6 _nanov2_free  (in libsystem_malloc.dylib) + 0,188,...  [0x18dc292a8,0x18dc29364,...]
    +           ! : | + ! : 2 CFAllocatorDeallocate  (in CoreFoundation) + 0,180  [0x18de5d14c,0x18de5d200]
    +           ! : | + ! : 1 __CFAllocatorSystemDeallocate  (in CoreFoundation) + 40  [0x18de5f140]
    +           ! : | + ! : | 1 malloc_default_zone  (in libsystem_malloc.dylib) + 0  [0x18dc0d2e4]
    +           ! : | + ! : 1 malloc_zone_free  (in libsystem_malloc.dylib) + 12  [0x18dc117fc]
    +           ! : | + ! : 1 nanov2_free  (in libsystem_malloc.dylib) + 0  [0x18dc0e5b0]
    +           ! : | + ! 3 _CFRelease  (in CoreFoundation) + 88,1192,...  [0x18dfa5ac4,0x18dfa5f14,...]
    +           ! : | + ! 2 _CFRelease  (in CoreFoundation) + 1228  [0x18dfa5f38]
    +           ! : | + ! : 1 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! : | 1 CF_IS_OBJC  (in CoreFoundation) + 252  [0x18dfa551c]
    +           ! : | + ! : 1 _CFRelease  (in CoreFoundation) + 136  [0x18dfa5af4]
    +           ! : | + ! 1 CFGetAllocator  (in CoreFoundation) + 68  [0x18de5d0ec]
    +           ! : | + 121 -[IOGPUMetalResource dealloc]  (in IOGPU) + 304  [0x1ac748160]
    +           ! : | + ! 119 -[_MTLObjectWithLabel dealloc]  (in Metal) + 64  [0x197f8ca1c]
    +           ! : | + ! : 55 _objc_rootDealloc  (in libobjc.A.dylib) + 80  [0x18da2ae24]
    +           ! : | + ! : | 34 objc_destructInstance  (in libobjc.A.dylib) + 144  [0x18da2aebc]
    +           ! : | + ! : | + 33 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 108  [0x18da360b4]
    +           ! : | + ! : | + ! 30 weak_clear_no_lock  (in libobjc.A.dylib) + 48  [0x18da36184]
    +           ! : | + ! : | + ! : 30 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 92,76,...  [0x18da2e42c,0x18da2e41c,...]
    +           ! : | + ! : | + ! 3 weak_clear_no_lock  (in libobjc.A.dylib) + 28,48  [0x18da36170,0x18da36184]
    +           ! : | + ! : | + 1 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 56  [0x18da36080]
    +           ! : | + ! : | 21 objc_destructInstance  (in libobjc.A.dylib) + 80  [0x18da2ae7c]
    +           ! : | + ! : |   10 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 116  [0x18da335f8]
    +           ! : | + ! : |   ! 8 -[IOGPUMetalResource .cxx_destruct]  (in IOGPU) + 36  [0x1ac748a50]
    +           ! : | + ! : |   ! : 8 objc_destroyWeak  (in libobjc.A.dylib) + 152  [0x18da345f4]
    +           ! : | + ! : |   ! :   7 weak_unregister_no_lock  (in libobjc.A.dylib) + 128,44,...  [0x18da4024c,0x18da401f8,...]
    +           ! : | + ! : |   ! :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | + ! : |   ! :     1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 80  [0x18da2e420]
    +           ! : | + ! : |   ! 2 objc_destroyWeak  (in libobjc.A.dylib) + 32,184  [0x18da3457c,0x18da34614]
    +           ! : | + ! : |   5 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 56,116,...  [0x18da335bc,0x18da335f8,...]
    +           ! : | + ! : |   3 -[IOGPUMetalResource .cxx_destruct]  (in IOGPU) + 52,56,...  [0x1ac748a60,0x1ac748a64,...]
    +           ! : | + ! : |   3 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 76  [0x18da335d0]
    +           ! : | + ! : |     3 lookupMethodInClassAndLoadCache  (in libobjc.A.dylib) + 52  [0x18da33488]
    +           ! : | + ! : |       3 cache_getImp  (in libobjc.A.dylib) + 8,48,...  [0x18da29868,0x18da29890,...]
    +           ! : | + ! : 30 free_tiny  (in libsystem_malloc.dylib) + 100,512,...  [0x18dc0fd28,0x18dc0fec4,...]
    +           ! : | + ! : 23 free_tiny  (in libsystem_malloc.dylib) + 496  [0x18dc0feb4]
    +           ! : | + ! : | 11 tiny_free_no_lock  (in libsystem_malloc.dylib) + 356,1856,...  [0x18dc1019c,0x18dc10778,...]
    +           ! : | + ! : | 8 tiny_free_no_lock  (in libsystem_malloc.dylib) + 644  [0x18dc102bc]
    +           ! : | + ! : | + 8 tiny_free_list_add_ptr  (in libsystem_malloc.dylib) + 568,552,...  [0x18dc10ae8,0x18dc10ad8,...]
    +           ! : | + ! : | 3 tiny_free_no_lock  (in libsystem_malloc.dylib) + 1376  [0x18dc10598]
    +           ! : | + ! : | + 2 tiny_free_reattach_region  (in libsystem_malloc.dylib) + 160,240  [0x18dc171e8,0x18dc17238]
    +           ! : | + ! : | + 1 tiny_free_reattach_region  (in libsystem_malloc.dylib) + 368  [0x18dc172b8]
    +           ! : | + ! : | +   1 tiny_free_list_add_ptr  (in libsystem_malloc.dylib) + 144  [0x18dc10940]
    +           ! : | + ! : | 1 tiny_free_no_lock  (in libsystem_malloc.dylib) + 436  [0x18dc101ec]
    +           ! : | + ! : |   1 tiny_free_list_remove_ptr  (in libsystem_malloc.dylib) + 288  [0x18dc10c44]
    +           ! : | + ! : 3 _szone_free  (in libsystem_malloc.dylib) + 164  [0x18dc2159c]
    +           ! : | + ! : 3 nanov2_try_free_default  (in libsystem_malloc.dylib) + 0  [0x18dc29604]
    +           ! : | + ! : 2 _objc_rootDealloc  (in libobjc.A.dylib) + 8,76  [0x18da2addc,0x18da2ae20]
    +           ! : | + ! : 2 free  (in libsystem_malloc.dylib) + 92,124  [0x18dc0bd1c,0x18dc0bd3c]
    +           ! : | + ! : 1 _nanov2_free  (in libsystem_malloc.dylib) + 368  [0x18dc29418]
    +           ! : | + ! 1 -[_MTLObjectWithLabel dealloc]  (in Metal) + 32  [0x197f8c9fc]
    +           ! : | + ! : 1 objc_retain_x28  (in libobjc.A.dylib) + 68  [0x18da27fd0]
    +           ! : | + ! 1 free_tiny  (in libsystem_malloc.dylib) + 532  [0x18dc0fed8]
    +           ! : | + 42 -[IOGPUMetalResource dealloc]  (in IOGPU) + 216  [0x1ac748108]
    +           ! : | + ! 34 objc_storeWeak  (in libobjc.A.dylib) + 452  [0x18da2e184]
    +           ! : | + ! : 23 weak_unregister_no_lock  (in libobjc.A.dylib) + 316  [0x18da40308]
    +           ! : | + ! : 11 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | + ! :   11 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 92,128  [0x18da2e42c,0x18da2e450]
    +           ! : | + ! 6 objc_storeWeak  (in libobjc.A.dylib) + 148  [0x18da2e054]
    +           ! : | + ! : 6 locker_mixin>::lockWith(lockdebug::lock_mixin&)  (in libobjc.A.dylib) + 132,32,...  [0x18da5a294,0x18da5a230,...]
    +           ! : | + ! 2 objc_storeWeak  (in libobjc.A.dylib) + 76,116  [0x18da2e00c,0x18da2e034]
    +           ! : | + 15 -[IOGPUMetalResource dealloc]  (in IOGPU) + 232  [0x1ac748118]
    +           ! : | + ! 6 objc_autorelease  (in libobjc.A.dylib) + 256  [0x18da2d920]
    +           ! : | + ! : 6 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 156,236,...  [0x18da5b954,0x18da5b9a4,...]
    +           ! : | + ! 4 objc_autorelease  (in libobjc.A.dylib) + 316,308  [0x18da2d95c,0x18da2d954]
    +           ! : | + ! 4 objc_loadWeak  (in libobjc.A.dylib) + 24  [0x18da2e8ec]
    +           ! : | + ! : 4 objc_loadWeakRetained  (in libobjc.A.dylib) + 468,24  [0x18da2eae8,0x18da2e92c]
    +           ! : | + ! 1 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 164  [0x18da5b95c]
    +           ! : | + 5 -[IOGPUMetalResource dealloc]  (in IOGPU) + 272  [0x1ac748140]
    +           ! : | + ! 3 objc_release  (in libobjc.A.dylib) + 0  [0x18da27fe0]
    +           ! : | + ! 2 objc_retain_x28  (in libobjc.A.dylib) + 68  [0x18da27fd0]
    +           ! : | + 4 -[IOGPUMetalResource dealloc]  (in IOGPU) + 260  [0x1ac748134]
    +           ! : | + ! 4 objc_release  (in libobjc.A.dylib) + 0,28,...  [0x18da27fe0,0x18da27ffc,...]
    +           ! : | + 4 -[IOGPUMetalResource dealloc]  (in IOGPU) + 0,260  [0x1ac748030,0x1ac748134]
    +           ! : | + 3 -[IOGPUMetalResource dealloc]  (in IOGPU) + 196  [0x1ac7480f4]
    +           ! : | + ! 3 -[IOGPUMemoryInfo removeResourceFromList:]  (in IOGPU) + 32  [0x1ac7462d0]
    +           ! : | + !   3 os_unfair_lock_lock  (in libsystem_platform.dylib) + 16,0  [0x18de22f00,0x18de22ef0]
    +           ! : | + 1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 240  [0x1ac748120]
    +           ! : | + ! 1 objc_msgSend$_removeResource:  (in IOGPU) + 16  [0x1ac76f770]
    +           ! : | + 1 objc_loadWeak  (in libobjc.A.dylib) + 28  [0x18da2e8f0]
    +           ! : | 41 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 172  [0x1ac7396d8]
    +           ! : | + 41 -[IOGPUMetalDevice deallocBufferSubData:heapIndex:bufferIndex:bufferOffset:length:]  (in IOGPU) + 56  [0x1ac7414f8]
    +           ! : | +   17 IOGPUMetalSuballocatorFree  (in IOGPU) + 408,304,...  [0x1ac7508ec,0x1ac750884,...]
    +           ! : | +   9 IOGPUMetalSuballocatorFree  (in IOGPU) + 564  [0x1ac750988]
    +           ! : | +   ! 9 -[AGXG15XFamilyBuffer dealloc]  (in AGXMetalG15X_B0) + 76  [0x1f9bb6b98]
    +           ! : | +   !   9 -[AGXBuffer dealloc]  (in AGXMetalG15X_B0) + 44  [0x1f9b8d4e4]
    +           ! : | +   !     9 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 320  [0x1ac73976c]
    +           ! : | +   !       6 -[IOGPUMetalResource dealloc]  (in IOGPU) + 248  [0x1ac748128]
    +           ! : | +   !       : 6 _CFRelease  (in CoreFoundation) + 292  [0x18dfa5b90]
    +           ! : | +   !       :   6 ioGPUResourceFinalize  (in IOGPU) + 104  [0x1ac74e6a0]
    +           ! : | +   !       :     6 iokit_user_client_trap  (in IOKit) + 8  [0x19155626c]
    +           ! : | +   !       2 -[IOGPUMetalResource dealloc]  (in IOGPU) + 216  [0x1ac748108]
    +           ! : | +   !       : 2 objc_storeWeak  (in libobjc.A.dylib) + 452  [0x18da2e184]
    +           ! : | +   !       :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | +   !       :   | 1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 128  [0x18da2e450]
    +           ! : | +   !       :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 16  [0x18da401dc]
    +           ! : | +   !       1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 304  [0x1ac748160]
    +           ! : | +   !         1 -[_MTLObjectWithLabel dealloc]  (in Metal) + 64  [0x197f8ca1c]
    +           ! : | +   !           1 _objc_rootDealloc  (in libobjc.A.dylib) + 80  [0x18da2ae24]
    +           ! : | +   !             1 objc_destructInstance  (in libobjc.A.dylib) + 144  [0x18da2aebc]
    +           ! : | +   !               1 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 108  [0x18da360b4]
    +           ! : | +   !                 1 weak_clear_no_lock  (in libobjc.A.dylib) + 48  [0x18da36184]
    +           ! : | +   !                   1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 76  [0x18da2e41c]
    +           ! : | +   7 IOGPUMetalSuballocatorFree  (in IOGPU) + 180  [0x1ac750808]
    +           ! : | +   ! 7 MTLRangeAllocatorDeallocate  (in Metal) + 84,112,...  [0x197fc51d8,0x197fc51f4,...]
    +           ! : | +   3 IOGPUMetalSuballocatorFree  (in IOGPU) + 532  [0x1ac750968]
    +           ! : | +   ! 2 std::__tree, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator>>::__emplace_multi>(std::pair&&)  (in IOGPU) + 76,112  [0x1ac751174,0x1ac751198]
    +           ! : | +   ! 1 std::__tree, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator>>::__emplace_multi>(std::pair&&)  (in IOGPU) + 44  [0x1ac751154]
    +           ! : | +   !   1 IOGPUMetalSuballocatorHeap::Allocator, void*>>::allocate(unsigned long)  (in IOGPU) + 40  [0x1ac751248]
    +           ! : | +   !     1 posix_memalign  (in libsystem_malloc.dylib) + 40  [0x18dc13b68]
    +           ! : | +   !       1 _malloc_zone_memalign  (in libsystem_malloc.dylib) + 16  [0x18dc30bbc]
    +           ! : | +   2 IOGPUMetalSuballocatorFree  (in IOGPU) + 160  [0x1ac7507f4]
    +           ! : | +   ! 1 -[IOGPUMetalResource gpuAddress]  (in IOGPU) + 0  [0x1ac7481a0]
    +           ! : | +   ! 1 objc_msgSend$gpuAddress  (in IOGPU) + 0  [0x1ac770520]
    +           ! : | +   2 IOGPUMetalSuballocatorFree  (in IOGPU) + 472  [0x1ac75092c]
    +           ! : | +   ! 2 std::__tree, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator>>::__remove_node_pointer(std::__tree_node, void*>*)  (in IOGPU) + 100  [0x1ac750d94]
    +           ! : | +   !   2 std::__tree_remove[abi:v160006]*>(std::__tree_node_base*, std::__tree_node_base*)  (in IOGPU) + 68,88  [0x1ac750de8,0x1ac750dfc]
    +           ! : | +   1 IOGPUMetalSuballocatorFree  (in IOGPU) + 556  [0x1ac750980]
    +           ! : | +     1 free_small  (in libsystem_malloc.dylib) + 876  [0x18dc0ee60]
    +           ! : | +       1 small_free_list_add_ptr  (in libsystem_malloc.dylib) + 156  [0x18dc10db8]
    +           ! : | 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 104  [0x1ac739694]
    +           ! : | + 1 AutoreleasePoolPage::push()  (in libobjc.A.dylib) + 124  [0x18da5ca1c]
    +           ! : | +   1 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 0  [0x18da5b8b8]
    +           ! : | 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 16  [0x1ac73963c]
    +           ! : | 1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 308  [0x1ac748164]
    +           ! : | 1 objc_msgSendSuper2  (in libobjc.A.dylib) + 56  [0x18da29638]
    +           ! : 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 324  [0x1ac739770]
    +           ! 27 objc_msgSend  (in libobjc.A.dylib) + 0,8,...  [0x18da29400,0x18da29408,...]
    +           ! 9 _objc_rootRelease  (in libobjc.A.dylib) + 108  [0x18da2ed3c]
    +           29 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 132,168,...  [0x11bee1c40,0x11bee1c64,...]
    +           9 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 124  [0x11bee1c38]
    +             8 _nanov2_free  (in libsystem_malloc.dylib) + 668,784,...  [0x18dc29544,0x18dc295b8,...]
    +             1 free  (in libsystem_malloc.dylib) + 20  [0x18dc0bcd4]

How do you think if we just leak the memory on exit to save the time?

@awni
Copy link
Member

awni commented May 8, 2024

I wonder why it's so slow / if there is a way to asynchronously free resources? Indeed I've noticed in the past that the program can take a while to exit when you are holding a lot of RAM. Leaking would be fairly easy ... just avoid releasing buffers in the buffer cache when the allocator is destroyed. Do we lose anything from doing that?

@zcbenz
Copy link
Contributor Author

zcbenz commented May 8, 2024

Memory allocators are slow. Doing it asynchronously does not help here because the program still has to wait for tasks to finish before exiting.

Leaking memory on exit is safe and standard behavior, for example when closing a Chrome tab all DOM elements are just leaked you can rely on OS safely collecting all the RAM used.

@madrob
Copy link
Contributor

madrob commented May 9, 2024

One danger of leaking memory on close is that in the future it makes things like ValGrind less useful when chasing a runtime memory leak.

@zcbenz
Copy link
Contributor Author

zcbenz commented May 9, 2024

You can tell sanitizers a specific leak is expected. Chromium explicitly leaks memory for static objects everywhere (base::NoDestructor) and still make use of various sanitizers.

@awni
Copy link
Member

awni commented May 10, 2024

I'm not opposed to this. I can mark it as an enhancement. @zcbenz I'm not sure if you are planning to send a PR. Happy to take a look if so / try it out.

@awni awni added the enhancement New feature or request label May 10, 2024
@zcbenz
Copy link
Contributor Author

zcbenz commented May 10, 2024

I will send a PR sometime later this month if no one else had worked on it.

@zcbenz zcbenz linked a pull request May 21, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants