Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Node failed + core dump on load running against node #22347

Open
1 task done
pilshchikov opened this issue May 10, 2024 · 3 comments
Open
1 task done

[DocDB] Node failed + core dump on load running against node #22347

pilshchikov opened this issue May 10, 2024 · 3 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pilshchikov
Copy link
Contributor

pilshchikov commented May 10, 2024

Jira Link: DB-11254

Description

Case:

  1. Run sequentially SqlUpdate, SqlDataload, SqlSecondaryIndex, SqlSnapshotTxns, SqlForeignKeyAndJons against 3 nodes RF=3 cluster, c6g.xlarge, 4 CPU 8 GB RAM
  2. After 1-2 minutes one node is failing and throwing core dump:
(lldb) target create "/home/yugabyte/yb-software/yugabyte-2.23.0.0-b296-almalinux8-aarch64/postgres/bin/postgres" --core "/home/yugabyte/cores/core_31660_1715143194_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b296-almalinux8-aarch64!postgres!bin!postgres"
Core file '/home/yugabyte/cores/core_31660_1715143194_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b296-almalinux8-aarch64!postgres!bin!postgres' (aarch64) was loaded.
(lldb) bt all
* thread #1, name = 'postgres', stop reason = signal SIGSEGV: address not mapped to object
  * frame #0: 0x0000ffff971bd5a4 libyb_pggate_webserver.so`std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, void*>*> std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>(this=<unavailable>, __k=<unavailable>) const at __hash_table:2168:31
    frame #1: 0x0000ffff971bc77c libyb_pggate_webserver.so`yb::Status yb::PrometheusWriter::WriteSingleEntryNonTable<unsigned long>(std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long const&) [inlined] std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find[abi:v170002](this=<unavailable>, __k="table_id") const at unordered_map:1534:69
    frame #2: 0x0000ffff971bc778 libyb_pggate_webserver.so`yb::Status yb::PrometheusWriter::WriteSingleEntryNonTable<unsigned long>(this=0x0000ffff8c6ad020, attr=<unavailable>, name="yb_ysqlserver_active_connection_total", value=0x0000ffff8c6ad2d8) at metrics_writer.h:44:20
    frame #3: 0x0000ffff971b9cc0 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) at pgsql_webserver_wrapper.cc:96:3
    frame #4: 0x0000ffff971b9b58 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(req=<unavailable>, resp=<unavailable>) at pgsql_webserver_wrapper.cc:529:3
    frame #5: 0x0000ffff971283c0 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::__function::__value_func<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator()[abi:v170002](this=0x00000688ff810900, __args=0x0000ffff8c6af540, __args=0x0000ffff8c6ad420) const at function.h:517:16
    frame #6: 0x0000ffff971283a4 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::function<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator()(this= Function = yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) , __arg=0x0000ffff8c6af540, __arg=0x0000ffff8c6ad540) const at function.h:1168:12
    frame #7: 0x0000ffff971283a4 libserver_process.so`yb::Webserver::Impl::RunPathHandler(this=0x00000688ffdfc500, handler=0x00000688ffdcf810, connection=0x00000688ff989000, request_info=<unavailable>) at webserver.cc:648:5
    frame #8: 0x0000ffff97127ca0 libserver_process.so`yb::Webserver::Impl::BeginRequestCallback(this=0x00000688ffdfc500, connection=<unavailable>, request_info=0x00000688ff989000) at webserver.cc:567:33
    frame #9: 0x0000ffff97133114 libserver_process.so`worker_thread + 5524
    frame #10: 0x0000ffffa97d78b8 libpthread.so.0`start_thread + 392
    frame #11: 0x0000ffffa9673afc libc.so.6`thread_start + 12
  thread #2, stop reason = signal 0
    frame #0: 0x0000ffffa97ddc58 libpthread.so.0`pthread_cond_wait@@GLIBC_2.17 + 528
    frame #1: 0x0000ffff97131b08 libserver_process.so`master_thread + 1416
    frame #2: 0x0000ffffa97d78b8 libpthread.so.0`start_thread + 392
    frame #3: 0x0000ffffa9673afc libc.so.6`thread_start + 12
  thread #3, stop reason = signal 0
    frame #0: 0x0000aaaaac3b8ffc postgres`__do_fini
    frame #1: 0x0000ffffaab74cd4 ld-linux-aarch64.so.1`_dl_fini at dl-fini.c:141:9
    frame #2: 0x0000ffffa968899c libc.so.6`__run_exit_handlers + 252
    frame #3: 0x0000ffffa9688b1c libc.so.6`exit + 28
    frame #4: 0x0000aaaaac8900f4 postgres`proc_exit(code=0) at ipc.c:157:2
    frame #5: 0x0000ffff96f93840 yb_pg_metrics.so`webserver_worker_main(unused=<unavailable>) at yb_pg_metrics.c:443:3
    frame #6: 0x0000aaaaac7e9204 postgres`StartBackgroundWorker at bgworker.c:849:2
    frame #7: 0x0000aaaaac802594 postgres`maybe_start_bgworkers [inlined] do_start_bgworker(rw=0x00000688ffd102c0) at postmaster.c:6100:4
    frame #8: 0x0000aaaaac802538 postgres`maybe_start_bgworkers at postmaster.c:6326:9
    frame #9: 0x0000aaaaac7feadc postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) at postmaster.c:1432:2
    frame #10: 0x0000aaaaac6f8544 postgres`PostgresServerProcessMain(argc=25, argv=0x00000688ffd120d0) at main.c:234:3
    frame #11: 0x0000aaaaac3b90b8 postgres`main + 36
    frame #12: 0x0000ffffa9674384 libc.so.6`__libc_start_main + 220
    frame #13: 0x0000aaaaac3b8f74 postgres`_start + 52

Version: 2.23.0.0-b296

Logs in JIRA task

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pilshchikov pilshchikov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels May 10, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 10, 2024
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label May 10, 2024
@rthallamko3
Copy link
Contributor

@pilshchikov , Do you think this is a recent regression? Are we able to triangulate/narrow down the builds?

@pilshchikov
Copy link
Contributor Author

@rthallamko3 it starts happen only between 2.23.0.0-b247-2.23.0.0-b265 on master branch and 2024.1.0.0-b104-2024.1.0.0-b122 on 2024.1 branch
image

@yusong-yan
Copy link
Contributor

Duplicate of #17847

@yusong-yan yusong-yan marked this as a duplicate of #17847 May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants