Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(message_ex::copy): In some case, the message_ex::copy function will trigger an assertion #1937

Closed
Sunflower876 opened this issue Mar 8, 2024 · 7 comments
Labels
good-first-issue type/bug This issue reports a bug.

Comments

@Sunflower876
Copy link
Contributor

Sunflower876 commented Mar 8, 2024

Bug Report

There are two formats for message_ex, one is that both the header and body are saved in the buffers field, and the header and buffers [0] are the same. the other format is that the header is not saved in the buffers field and exists separately. In the message_ex::copy function, total_length represents the sum of the lengths of the header and body. However, in the second case, i only counts the length of the body, so checking that i and total_length are equal will fail.

20240308-151800

@Sunflower876 Sunflower876 added the type/bug This issue reports a bug. label Mar 8, 2024
@Sunflower876
Copy link
Contributor Author

@acelyc111

@acelyc111
Copy link
Member

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

@Sunflower876
Copy link
Contributor Author

Sunflower876 commented Mar 11, 2024

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

#0  0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) #0  0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6
#1  0x00007fe05f16ae00 in dsn::tools::simple_logger::dsn_logv (
    this=0x2b25030, file=<optimized out>, function=<optimized out>,
    line=<optimized out>, log_level=LOG_LEVEL_FATAL,
    fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=<optimized out>)
    at /home/work/pegasus/rdsn/src/utils/simple_logger.cpp:206
#2  0x00007fe05f15e04c in dsn_logv (file=0x7fe0604f25b7 "compiler_depend.ts",
    function=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::__FUNCTION__> "copy", line=248, log_level=LOG_LEVEL_FATAL,
    fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s",
    args=args@entry=0x7fe01b3ce598)
    at /home/work/pegasus/rdsn/src/utils/logging.cpp:129
#3  0x00007fe05f15e0cf in dsn_logf (
    file=file@entry=0x7fe0604f25b7 "compiler_depend.ts",
    function=function@entry=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::__FUNCTION__> "copy", line=line@entry=248,
    log_level=log_level@entry=LOG_LEVEL_FATAL,
    fmt=fmt@entry=0x7fe06058ab7d "%d VS %d, rpc_name = %s")
    at /home/work/pegasus/rdsn/src/utils/logging.cpp:141
#4  0x00007fe060452348 in dsn::message_ex::copy (this=<optimized out>,
    clone_content=clone_content@entry=true,
    copy_for_receive=copy_for_receive@entry=true)
    at /home/work/pegasus/rdsn/src/runtime/rpc/rpc_message.cpp:247
#5  0x000000000058d6da in pegasus::server::pegasus_server_impl::on_refresh_ttl
    (this=0x41a4100, rpc=...)
    at /home/work/pegasus/src/server/pegasus_server_impl.cpp:972
#6  0x0000000000560a8b in pegasus::server::pegasus_read_service::on_refresh_ttl (svc=<optimized out>, rpc=...)
    at /home/work/pegasus/src/server/pegasus_read_service.h:100
#7  0x0000000000568513 in operator() (request=<optimized out>, p=0x41a4100,
    __closure=<optimized out>)
    at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:41
#8  std::_Function_handler<void (pegasus::server::pegasus_read_service*, dsn::message_ex*), bool dsn::replication::storage_serverlet<pegasus::server::pegasus_read_service>::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response> >(dsn::task_code, char const*, void (*)(pegasus::server::pegasus_read_service*, dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response>))::{lambda(pegasus::server::pegasus_read_service*, dsn::message_ex*)#1}>::_M_invoke(std::_Any_data const&, pegasus::server::pegasus_read_service*&&, dsn::message_ex*&&) (
    __functor=..., __args#0=<optimized out>, __args#1=<optimized out>)
    at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#9  0x000000000059f71b in operator() (__args#1=<optimized out>,
    __args#0=<optimized out>, this=<optimized out>)
    at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#10 handle_request (request=0x218218c0, this=0x41a4100)
    at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:92
#11 pegasus::server::pegasus_read_service::on_request (this=0x41a4100,
    request=0x218218c0)
    at /home/work/pegasus/src/server/pegasus_read_service.h:48
#12 0x00007fe06024a2b7 in dsn::replication::replica::on_client_read (
    this=0x3472800, request=request@entry=0x218218c0,
    ignore_throttling=ignore_throttling@entry=false)
    at /home/work/pegasus/rdsn/src/replica/replica.cpp:252
#13 0x00007fe0602adb01 in dsn::replication::replica_stub::on_client_read (
    this=0x2b48300, id=..., request=0x218218c0)
    at /home/work/pegasus/rdsn/src/replica/replica_stub.cpp:956
#14 0x00007fe060453c22 in operator() (__args#0=<optimized out>,
    this=0x2d4bd0d0)
    at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#15 dsn::rpc_request_task::exec (this=0x2d4bd000)
    at /home/work/pegasus/rdsn/include/dsn/tool-api/task.h:435
#16 0x00007fe0604549d1 in dsn::task::exec_internal (
    this=this@entry=0x2d4bd000)
    at /home/work/pegasus/rdsn/src/runtime/task/task.cpp:176
#17 0x00007fe060469d9a in dsn::task_worker::loop (this=0x337f550)
    at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:211
#18 0x00007fe060469fc0 in dsn::task_worker::run_internal (this=0x337f550)
    at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:191
#19 0x00007fe0604f225f in execute_native_thread_routine ()
   from /home/work/app/pegasus/c3srv-algosample-other/replica/package/bin/[libdsn_replica_server.so](http://libdsn_replica_server.so/)
#20 0x00007fe05d03fdc5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fe05b55273d in clone () from /lib64/libc.so.6

@Sunflower876
Copy link
Contributor Author

@Sunflower876 Thanks!
Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

#0 0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) #0 0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6 #1 0x00007fe05f16ae00 in dsn::tools::simple_logger::dsn_logv ( this=0x2b25030, file=, function=, line=, log_level=LOG_LEVEL_FATAL, fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=) at /home/work/pegasus/rdsn/src/utils/simple_logger.cpp:206 #2 0x00007fe05f15e04c in dsn_logv (file=0x7fe0604f25b7 "compiler_depend.ts", function=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::FUNCTION> "copy", line=248, log_level=LOG_LEVEL_FATAL, fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=args@entry=0x7fe01b3ce598) at /home/work/pegasus/rdsn/src/utils/logging.cpp:129 #3 0x00007fe05f15e0cf in dsn_logf ( file=file@entry=0x7fe0604f25b7 "compiler_depend.ts", function=function@entry=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::FUNCTION> "copy", line=line@entry=248, log_level=log_level@entry=LOG_LEVEL_FATAL, fmt=fmt@entry=0x7fe06058ab7d "%d VS %d, rpc_name = %s") at /home/work/pegasus/rdsn/src/utils/logging.cpp:141 #4 0x00007fe060452348 in dsn::message_ex::copy (this=, clone_content=clone_content@entry=true, copy_for_receive=copy_for_receive@entry=true) at /home/work/pegasus/rdsn/src/runtime/rpc/rpc_message.cpp:247 #5 0x000000000058d6da in pegasus::server::pegasus_server_impl::on_refresh_ttl (this=0x41a4100, rpc=...) at /home/work/pegasus/src/server/pegasus_server_impl.cpp:972 #6 0x0000000000560a8b in pegasus::server::pegasus_read_service::on_refresh_ttl (svc=, rpc=...) at /home/work/pegasus/src/server/pegasus_read_service.h:100 #7 0x0000000000568513 in operator() (request=, p=0x41a4100, _closure=) at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:41 #8 std::Function_handler<void (pegasus::server::pegasus_read_service*, dsn::message_ex*), bool dsn::replication::storage_serverletpegasus::server::pegasus_read_service::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response> >(dsn::task_code, char const*, void ()(pegasus::server::pegasus_read_service, dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response>))::{lambda(pegasus::server::pegasus_read_service*, dsn::message_ex*)#1}>::_M_invoke(std::_Any_data const&, pegasus::server::pegasus_read_service*&&, dsn::message_ex*&&) ( __functor=..., __args#0=, __args#1=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #9 0x000000000059f71b in operator() (__args#1=, __args#0=, this=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706 #10 handle_request (request=0x218218c0, this=0x41a4100) at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:92 #11 pegasus::server::pegasus_read_service::on_request (this=0x41a4100, request=0x218218c0) at /home/work/pegasus/src/server/pegasus_read_service.h:48 #12 0x00007fe06024a2b7 in dsn::replication::replica::on_client_read ( this=0x3472800, request=request@entry=0x218218c0, ignore_throttling=ignore_throttling@entry=false) at /home/work/pegasus/rdsn/src/replica/replica.cpp:252 #13 0x00007fe0602adb01 in dsn::replication::replica_stub::on_client_read ( this=0x2b48300, id=..., request=0x218218c0) at /home/work/pegasus/rdsn/src/replica/replica_stub.cpp:956 #14 0x00007fe060453c22 in operator() (__args#0=, this=0x2d4bd0d0) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706 #15 dsn::rpc_request_task::exec (this=0x2d4bd000) at /home/work/pegasus/rdsn/include/dsn/tool-api/task.h:435 #16 0x00007fe0604549d1 in dsn::task::exec_internal ( this=this@entry=0x2d4bd000) at /home/work/pegasus/rdsn/src/runtime/task/task.cpp:176 #17 0x00007fe060469d9a in dsn::task_worker::loop (this=0x337f550) at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:211 #18 0x00007fe060469fc0 in dsn::task_worker::run_internal (this=0x337f550) at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:191 #19 0x00007fe0604f225f in execute_native_thread_routine () from /home/work/app/pegasus/c3srv-algosample-other/replica/package/bin/libdsn_replica_server.so #20 0x00007fe05d03fdc5 in start_thread () from /lib64/libpthread.so.0 #21 0x00007fe05b55273d in clone () from /lib64/libc.so.6

image

@Sunflower876
Copy link
Contributor Author

Sunflower876 commented Apr 2, 2024

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

The first format is the Thrift message format, used for communication between client and server. The second format is the RDSN message format, used for communication between servers. The second format calling message_ex:: copy will definitely crash.

7d85bfd6-62d8-4e40-b57d-c43a757c03b1

image

@acelyc111
Copy link
Member

I saw the crash is caused by request on_refresh_ttl, is it a internal API in your environment?

@Sunflower876
Copy link
Contributor Author

I saw the crash is caused by request on_refresh_ttl, is it a internal API in your environment?

Yes. But it is because there is a bug in the implementation of message_ex:: copy that any message call between servers will lead to crash. You can try the case below. This will definitely crash.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

2 participants