Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine columnar crash with infinity core dump loop #2176

Closed
1 task done
2peter3 opened this issue May 14, 2024 · 11 comments
Closed
1 task done

Engine columnar crash with infinity core dump loop #2176

2peter3 opened this issue May 14, 2024 · 11 comments
Assignees
Labels
bug rel::6.3.0 Released in 6.3.0

Comments

@2peter3
Copy link

2peter3 commented May 14, 2024

Bug Description:

UPDATE May 17 2024

Find MRE here #2176 (comment)

Original description:


When using the engine "columnar" im getting a core dump, problem is i couldnt find what really happen

Manticore Search Version:

Manticore 6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)

Operating System Version:

Debian GNU/Linux 12 (bookworm)

Have you tried the latest development version?

  • Yes

Internal Checklist:

manticore config

   server_id = 1000
   listen = 127.0.0.1:9312
   listen = 192.168.0.30:9312
   listen = 192.168.0.30:9313:mysql41
   listen = 192.168.0.30:9360-9370:replication

   auto_schema = 1
   mysql_version_string = 5.0.37

   log = /var/log/manticore/searchd.log
   query_log = /var/log/manticore/query.log
   pid_file = /var/run/manticore/searchd.pid
   data_dir = /data/manticore
}

Crashdump

[Tue May 14 08:33:09.562 2024] [45524] Using local time zone '/etc/localtime'
[Tue May 14 08:33:09.566 2024] [45524] starting daemon version '6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)' ...
[Tue May 14 08:33:09.566 2024] [45524] listening on 127.0.0.1:9312 for sphinx and http(s)
[Tue May 14 08:33:09.567 2024] [45524] listening on 192.168.0.30:9312 for sphinx and http(s)
[Tue May 14 08:33:09.567 2024] [45524] listening on 192.168.0.30:9313 for mysql
[Tue May 14 08:33:09.593 2024] [45528] prereading 0 tables
[Tue May 14 08:33:09.593 2024] [45528] preread 0 tables in 0.000 sec
[Tue May 14 08:33:09.602 2024] [45524] accepting connections
[Tue May 14 08:33:09.645 2024] [45538] [BUDDY] started v2.3.7 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://127.0.0.1:9312 --bind=127.0.0.1  --threads=30' at http://127.0.0.1:46467
[Tue May 14 08:33:09.645 2024] [45538] [BUDDY] Loaded plugins:
[Tue May 14 08:33:09.645 2024] [45538] [BUDDY]   core: empty-string, backup, emulate-elastic, create, insert, alias, select, show, cli-table, plugin, test, alter-distributed-table, alter-rename-table, modify-table, knn, replace, queue, sharding
[Tue May 14 08:33:09.645 2024] [45538] [BUDDY]   local: 
[Tue May 14 08:33:09.645 2024] [45538] [BUDDY]   extra: 
------- FATAL: CRASH DUMP -------
[Tue May 14 08:34:19.686 2024] [45524]

--- crashed SphinxQL request dump ---

--- request dump end ---
--- local index:2nofo
Manticore 6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 16.0.6
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bookworm -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bookworm) (cross-compiled)
Stack bottom = 0x7f9efc086450, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x20000)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x20000, stack=0x7f9efc090000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x227)[0x5582f2e5aec7]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x364)[0x5582f2cd0e14]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f9f64a5b050]
/usr/bin/searchd(_ZN13LibcCIHash_fn4HashEPKhim+0x30)[0x5582f3094fd0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar13Packer_Hash_c6AddDocEPKhi+0x20)[0x7f9f641ec8a0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar9Builder_c7SetAttrEiPKhi+0x3e)[0x7f9f646dfc1e]
/usr/bin/searchd(_Z15SetColumnarAttri8ESphAttrPN8columnar9Builder_iERSt10unique_ptrINS0_10Iterator_iESt14default_deleteIS4_EEjRN3sph8Vector_TIlNS9_13DefaultCopy_TIlEENS9_14DefaultRelimitENS9_16DefaultStorage_TIlEEEE+0x145)[0x5582f30a4b95]
/usr/bin/searchd(_ZNK9RtIndex_c15WriteAttributesER21SaveDiskDataContext_tR10CSphString+0x7a7)[0x5582f2f3ec97]
/usr/bin/searchd(_ZNK9RtIndex_c12SaveDiskDataEPKcRK11VecTraits_TI17CSphRefcountedPtrIK11RtSegment_tEERK12ChunkStats_tR10CSphString+0x17f)[0x5582f2f410df]
/usr/bin/searchd(_ZN9RtIndex_c13SaveDiskChunkEbbb+0x5d5)[0x5582f2f3d095]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x5582f3e763bc]
/usr/bin/searchd(make_fcontext+0x2f)[0x5582f3eb886f]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007F9F64A5B050 in /lib/x86_64-linux-gnu/libc.so.6
 3# LibcCIHash_fn::Hash(unsigned char const*, int, unsigned long) in /usr/bin/searchd
 4# columnar::Packer_Hash_c::AddDoc(unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 5# columnar::Builder_c::SetAttr(int, unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 6# SetColumnarAttr(int, ESphAttr, columnar::Builder_i*, std::unique_ptr<columnar::Iterator_i, std::default_delete<columnar::Iterator_i> >&, unsigned int, sph::Vector_T<long, sph::DefaultCopy_T<long>, sph::DefaultRelimit, sph::DefaultStorage_T<long> >&) in /usr/bin/searchd
 7# RtIndex_c::WriteAttributes(SaveDiskDataContext_t&, CSphString&) const in /usr/bin/searchd
 8# RtIndex_c::SaveDiskData(char const*, VecTraits_T<CSphRefcountedPtr<RtSegment_t const> > const&, ChunkStats_t const&, CSphString&) const in /usr/bin/searchd
 9# RtIndex_c::SaveDiskChunk(bool, bool, bool) in /usr/bin/searchd
10# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
11# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_20), proto sphinx, state query, command query
thd 1 (work_21), proto sphinx, state query, command query
thd 2 (work_23), proto sphinx, state query, command query
thd 3 (work_26), proto sphinx, state query, command query
--- Totally 5 threads, and 4 client-working threads ---
------- CRASH DUMP END -------
[Tue May 14 08:34:22.993 2024] [45523] watchdog: main process 45524 crashed via CRASH_EXIT (exit code 2), will be restarted
[Tue May 14 08:34:22.993 2024] [45523] watchdog: main process 45585 forked ok
[Tue May 14 08:34:22.994 2024] [45585] Using local time zone '/etc/localtime'
[Tue May 14 08:34:22.997 2024] [45585] starting daemon version '6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)' ...
[Tue May 14 08:34:22.998 2024] [45585] listening on 127.0.0.1:9312 for sphinx and http(s)
[Tue May 14 08:34:22.998 2024] [45585] listening on 192.168.0.30:9312 for sphinx and http(s)
[Tue May 14 08:34:22.998 2024] [45585] listening on 192.168.0.30:9313 for mysql
[Tue May 14 08:34:23.025 2024] [45591] binlog: replaying log /data/manticore/binlog/binlog.001

--- crashed invalid query ---

--- request dump end ---
--- local index:
Manticore 6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 16.0.6
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bookworm -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bookworm) (cross-compiled)
Stack bottom = 0x7f9ee0168ed0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x20000)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x20000, stack=0x7f9ee0170000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x227)[0x5582f2e5aec7]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x364)[0x5582f2cd0e14]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f9f64a5b050]
/usr/bin/searchd(_ZN13LibcCIHash_fn4HashEPKhim+0x30)[0x5582f3094fd0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar13Packer_Hash_c6AddDocEPKhi+0x20)[0x7f9f641ec8a0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar9Builder_c7SetAttrEiPKhi+0x3e)[0x7f9f646dfc1e]
/usr/bin/searchd(_Z15SetColumnarAttri8ESphAttrPN8columnar9Builder_iERSt10unique_ptrINS0_10Iterator_iESt14default_deleteIS4_EEjRN3sph8Vector_TIlNS9_13DefaultCopy_TIlEENS9_14DefaultRelimitENS9_16DefaultStorage_TIlEEEE+0x145)[0x5582f30a4b95]
/usr/bin/searchd(_ZNK9RtIndex_c15WriteAttributesER21SaveDiskDataContext_tR10CSphString+0x7a7)[0x5582f2f3ec97]
/usr/bin/searchd(_ZNK9RtIndex_c12SaveDiskDataEPKcRK11VecTraits_TI17CSphRefcountedPtrIK11RtSegment_tEERK12ChunkStats_tR10CSphString+0x17f)[0x5582f2f410df]
/usr/bin/searchd(_ZN9RtIndex_c13SaveDiskChunkEbbb+0x5d5)[0x5582f2f3d095]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x5582f3e763bc]
/usr/bin/searchd(make_fcontext+0x2f)[0x5582f3eb886f]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007F9F64A5B050 in /lib/x86_64-linux-gnu/libc.so.6
 3# LibcCIHash_fn::Hash(unsigned char const*, int, unsigned long) in /usr/bin/searchd
 4# columnar::Packer_Hash_c::AddDoc(unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 5# columnar::Builder_c::SetAttr(int, unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 6# SetColumnarAttr(int, ESphAttr, columnar::Builder_i*, std::unique_ptr<columnar::Iterator_i, std::default_delete<columnar::Iterator_i> >&, unsigned int, sph::Vector_T<long, sph::DefaultCopy_T<long>, sph::DefaultRelimit, sph::DefaultStorage_T<long> >&) in /usr/bin/searchd
 7# RtIndex_c::WriteAttributes(SaveDiskDataContext_t&, CSphString&) const in /usr/bin/searchd
 8# RtIndex_c::SaveDiskData(char const*, VecTraits_T<CSphRefcountedPtr<RtSegment_t const> > const&, ChunkStats_t const&, CSphString&) const in /usr/bin/searchd
 9# RtIndex_c::SaveDiskChunk(bool, bool, bool) in /usr/bin/searchd
10# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
11# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
--- Totally 0 threads, and 0 client-working threads ---
------- CRASH DUMP END -------
[Tue May 14 08:34:28.813 2024] [45523] watchdog: main process 45585 crashed via CRASH_EXIT (exit code 2), will be restarted
[Tue May 14 08:34:28.813 2024] [45523] watchdog: main process 45617 forked ok
[Tue May 14 08:34:28.814 2024] [45617] Using local time zone '/etc/localtime'
[Tue May 14 08:34:28.818 2024] [45617] starting daemon version '6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)' ...
[Tue May 14 08:34:28.818 2024] [45617] listening on 127.0.0.1:9312 for sphinx and http(s)
[Tue May 14 08:34:28.818 2024] [45617] listening on 192.168.0.30:9312 for sphinx and http(s)
[Tue May 14 08:34:28.818 2024] [45617] listening on 192.168.0.30:9313 for mysql
[Tue May 14 08:34:28.844 2024] [45624] binlog: replaying log /data/manticore/binlog/binlog.001

--- crashed invalid query ---

--- request dump end ---
--- local index:
Manticore 6.2.13 53e7f8ad2@24051318 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 16.0.6
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bookworm -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bookworm) (cross-compiled)
Stack bottom = 0x7f9f204b6210, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x20000)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x20000, stack=0x7f9f204c0000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x227)[0x5582f2e5aec7]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x364)[0x5582f2cd0e14]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f9f64a5b050]
/usr/bin/searchd(_ZN13LibcCIHash_fn4HashEPKhim+0x30)[0x5582f3094fd0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar13Packer_Hash_c6AddDocEPKhi+0x20)[0x7f9f641ec8a0]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar9Builder_c7SetAttrEiPKhi+0x3e)[0x7f9f646dfc1e]
/usr/bin/searchd(_Z15SetColumnarAttri8ESphAttrPN8columnar9Builder_iERSt10unique_ptrINS0_10Iterator_iESt14default_deleteIS4_EEjRN3sph8Vector_TIlNS9_13DefaultCopy_TIlEENS9_14DefaultRelimitENS9_16DefaultStorage_TIlEEEE+0x145)[0x5582f30a4b95]
/usr/bin/searchd(_ZNK9RtIndex_c15WriteAttributesER21SaveDiskDataContext_tR10CSphString+0x7a7)[0x5582f2f3ec97]
/usr/bin/searchd(_ZNK9RtIndex_c12SaveDiskDataEPKcRK11VecTraits_TI17CSphRefcountedPtrIK11RtSegment_tEERK12ChunkStats_tR10CSphString+0x17f)[0x5582f2f410df]
/usr/bin/searchd(_ZN9RtIndex_c13SaveDiskChunkEbbb+0x5d5)[0x5582f2f3d095]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x5582f3e763bc]
/usr/bin/searchd(make_fcontext+0x2f)[0x5582f3eb886f]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007F9F64A5B050 in /lib/x86_64-linux-gnu/libc.so.6
 3# LibcCIHash_fn::Hash(unsigned char const*, int, unsigned long) in /usr/bin/searchd
 4# columnar::Packer_Hash_c::AddDoc(unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 5# columnar::Builder_c::SetAttr(int, unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 6# SetColumnarAttr(int, ESphAttr, columnar::Builder_i*, std::unique_ptr<columnar::Iterator_i, std::default_delete<columnar::Iterator_i> >&, unsigned int, sph::Vector_T<long, sph::DefaultCopy_T<long>, sph::DefaultRelimit, sph::DefaultStorage_T<long> >&) in /usr/bin/searchd
 7# RtIndex_c::WriteAttributes(SaveDiskDataContext_t&, CSphString&) const in /usr/bin/searchd
 8# RtIndex_c::SaveDiskData(char const*, VecTraits_T<CSphRefcountedPtr<RtSegment_t const> > const&, ChunkStats_t const&, CSphString&) const in /usr/bin/searchd
 9# RtIndex_c::SaveDiskChunk(bool, bool, bool) in /usr/bin/searchd
10# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
11# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
[Tue May 14 08:34:34.841 2024] [45523] watchdog: main process 45617 killed dirtily with signal 11, core dumped, will be restarted
[Tue May 14 08:34:34.842 2024] [45523] watchdog: main process 45672 forked ok
[Tue May 14 08:34:34.842 2024] [45672] FATAL: failed to create pid file '/var/run/manticore/searchd.pid': No such file or directory
[Tue May 14 08:34:34.843 2024] [45523] watchdog: main process 45672 exited cleanly (exit code 1), shutting down

Query:

REPLACE INTO test (id, d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12, d13, d14, d16, d17, d18, d19) VALUES (1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a'),(1,'a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a','a');

After this and something like 100k docs the server reboots and hangs on binlog in a infinity loop (core dump generation)

@2peter3 2peter3 added the bug label May 14, 2024
@2peter3
Copy link
Author

2peter3 commented May 14, 2024

As an additional note, I can mention that simultaneous replacements with just one value work without any issues. Therefore, there are likely problems when multiple replacements are made at the same time, including with concurrent connections.

@2peter3
Copy link
Author

2peter3 commented May 14, 2024

manticoresoftware/columnar#20

i think it is related to this

@tomatolog
Copy link
Contributor

could you upload your index and binlog as described in the manual

We could reproduce this crash locally and fix it or confirms that is was already fixed at manticoresoftware/columnar#20

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label May 15, 2024
@2peter3
Copy link
Author

2peter3 commented May 15, 2024

@tomatolog i have uploaded the data to manticore/write-only/issue-2176/dataFolder also @sanikolaev got a script via slack with a small file dump to reproduce it.

@2peter3
Copy link
Author

2peter3 commented May 15, 2024

fyi the bug is also presented on earlier versions:

Manticore 6.0.0 8de9df201@230206 (columnar 2.0.0 a7c703d@230130) (secondary 2.0.0 a7c703d@230130)

Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.4
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bookworm -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bookworm) (cross-compiled)
Stack bottom = 0x7f636c05dc80, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7f636c060000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x555e82ca27ea]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x555e82b64fc5]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f639dc5b050]
/usr/bin/searchd(_ZN13LibcCIHash_fn4HashEPKhim+0x30)[0x555e83089540]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar13Packer_Hash_c6AddDocEPKhi+0x20)[0x7f639d7f6d60]
/usr/share/manticore/modules/lib_manticore_columnar.so(_ZN8columnar9Builder_c7SetAttrEiPKhi+0x40)[0x7f639db94690]
/usr/bin/searchd(_Z15SetColumnarAttri8ESphAttrPN8columnar9Builder_iERSt10unique_ptrINS0_10Iterator_iESt14default_deleteIS4_EERN3sph8Vector_TIlNS9_13DefaultCopy_TIlEENS9_14DefaultRelimitENS9_16DefaultStorage_TIlEEEE+0x125)[0x555e83098d85]
/usr/bin/searchd(_ZNK9RtIndex_c15WriteAttributesER21SaveDiskDataContext_tR10CSphString+0x6b2)[0x555e82f4adc2]
/usr/bin/searchd(_ZNK9RtIndex_c12SaveDiskDataEPKcRK11VecTraits_TI17CSphRefcountedPtrIK11RtSegment_tEERK12ChunkStats_tR10CSphString+0x16e)[0x555e82f4ce8e]
/usr/bin/searchd(_ZN9RtIndex_c13SaveDiskChunkEbbb+0x5ba)[0x555e82f48f9a]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEE11VecTraits_TIhEENUlN5boost7context6detail10transfer_tEE_8__invokeES9_+0x1c)[0x555e8346e3dc]
/usr/bin/searchd(make_fcontext+0x2f)[0x555e8348d08f]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007F639DC5B050 in /lib/x86_64-linux-gnu/libc.so.6
 3# LibcCIHash_fn::Hash(unsigned char const*, int, unsigned long) in /usr/bin/searchd
 4# columnar::Packer_Hash_c::AddDoc(unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 5# columnar::Builder_c::SetAttr(int, unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 6# SetColumnarAttr(int, ESphAttr, columnar::Builder_i*, std::unique_ptr<columnar::Iterator_i, std::default_delete<columnar::Iterator_i> >&, sph::Vector_T<long, sph::DefaultCopy_T<long>, sph::DefaultRelimit, sph::DefaultStorage_T<long> >&) in /usr/bin/searchd
 7# RtIndex_c::WriteAttributes(SaveDiskDataContext_t&, CSphString&) const in /usr/bin/searchd
 8# RtIndex_c::SaveDiskData(char const*, VecTraits_T<CSphRefcountedPtr<RtSegment_t const> > const&, ChunkStats_t const&, CSphString&) const in /usr/bin/searchd
 9# RtIndex_c::SaveDiskChunk(bool, bool, bool) in /usr/bin/searchd
10# Threads::CoRoutine_c::CreateContext(std::function<void ()>, VecTraits_T<unsigned char>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
11# make_fcontext in /usr/bin/searchd

@2peter3
Copy link
Author

2peter3 commented May 15, 2024

I've discovered something else: it seems that the error only occurs in connection with multiple values in REPLACE INTO t (...) VALUES (1), (2), (3...) when the data type is "string". If the values are also relatively long, it crashes the database. This happens regardless of whether it is done through MySQL or via a bulk JSON API. The only situation where multiple replacements work well is when the data type is "text". Thus, there seems to be an issue with "string" data types, "concurrency", and long values in columnar databases.

@sanikolaev
Copy link
Collaborator

sanikolaev commented May 17, 2024

MRE

mysql -P9315 -h0 -e "drop table if exists t; CREATE TABLE t(ip string attribute) engine='columnar'";

curl -s -H 'Content-type: application/x-ndjson' 0:9316/bulk --data-binary '{"replace": {"index": "t", "id": 1, "doc": {"ip":"xx.xxx.xxx.xx"}}}
{"replace": {"index": "t", "id": 2, "doc": {"ip":"xx.xxx.xx.xx"}}}
{"replace": {"index": "t", "id": 1, "doc": {"ip":"xx.xxx.xxx.xx"}}}'

mysql -P9315 -h0 -e "flush ramchunk t"

Backtrace:

 0# sphBacktrace(int, bool) in searchd
 1# CrashLogger::HandleCrash(int) in searchd
 2# 0x00007F036F1BE520 in /lib/x86_64-linux-gnu/libc.so.6
 3# LibcCIHash_fn::Hash(unsigned char const*, int, unsigned long) in searchd
 4# columnar::Packer_Hash_c::AddDoc(unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 5# columnar::Builder_c::SetAttr(int, unsigned char const*, int) in /usr/share/manticore/modules/lib_manticore_columnar.so
 6# SetColumnarAttr(int, ESphAttr, columnar::Builder_i*, std::unique_ptr<columnar::Iterator_i, std::default_delete<columnar::Iterator_i> >&, unsigned int, sph::Vector_T<long, sph::DefaultCopy_T<long>, sph::DefaultRelimit, sph::DefaultStorage_T<long> >&) in searchd
 7# RtIndex_c::WriteAttributes(SaveDiskDataContext_t&, CSphString&) const in searchd
 8# RtIndex_c::SaveDiskData(char const*, VecTraits_T<CSphRefcountedPtr<RtSegment_t const> > const&, ChunkStats_t const&, CSphString&) const in searchd
 9# RtIndex_c::SaveDiskChunk(bool, bool, bool) in searchd
10# RtIndex_c::ForceDiskChunk() in searchd
11# HandleMysqlFlushRamchunk(RowBuffer_i&, SqlStmt_t const&) in searchd
12# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in searchd
13# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
14# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
15# 0x000055B1DFE40BAF in searchd
16# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
17# make_fcontext in searchd

Reproduced with:

Manticore 6.2.13 cc9294cd6@24051604 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)

Config:

searchd {
    listen = 9315:mysql
    listen = 9316
    buddy_path =
    log = searchd.log
    pid_file = searchd.pid
    data_dir = data
    binlog_path =
}

Notes:

  • Note, there are duplicate ids in the payload. It doesn't crash w/o that.
  • It doesn't crash w/o engine='columnar'
  • It doesn't crash if the same is done via mysql:
     drop table if exists t; 
     CREATE TABLE t(ip string attribute) engine='columnar';
     insert into t values(1, 'xx.xxx.xxx.xx'),(2,'xx.xxx.xx.xx'),(1, 'xx.xxx.xxx.xx');
     flush ramchunk t;
    

@sanikolaev sanikolaev removed the waiting Waiting for the original poster (in most cases) or something else label May 17, 2024
@tomatolog
Copy link
Contributor

I can not reproduce the crash using MRE from the comment with either

Manticore 6.2.13 cc9294cd6@24051604 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)

or master headr versions of daemon and MCL at windows box and linux at the dev2 box.

However I can reproduce the crash replaing binlog user provided and see that crash persists at the mater version of daemon and MCL. Will investigate the crash further

@sanikolaev
Copy link
Collaborator

I can not reproduce the crash using MRE from the #2176 (comment) with either

Strange. I can easily do it - https://youtu.be/oWmM1JeM9wY

@tomatolog
Copy link
Contributor

the crash was fixed at f6c433a

You need to reindex the data from scratch with the daemon from the development version to get the crash fixed

@tomatolog
Copy link
Contributor

some issue still left at #2209

@sanikolaev sanikolaev added the rel::upcoming Upcoming release label May 21, 2024
@sanikolaev sanikolaev added rel::6.3.0 Released in 6.3.0 and removed rel::upcoming Upcoming release labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug rel::6.3.0 Released in 6.3.0
Projects
None yet
Development

No branches or pull requests

3 participants