-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seastar allocator causes infinite recursive call if seastar is compiled as a shared library #2247
Comments
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 17, 2024
instead of using ubuntu:jammy and setup-cpp action for prepearing the building toolchain, use fedora:40 container for building and testing. after switching to the github workflow based CI, we've been seeing test failures due to networking issue: ``` Failed to install llvm via system package manager Error: Command failed with exit code 35: curl -LJO https://apt.llvm.org/llvm.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:04:19 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Broken pipe in connection to apt.llvm.org:443 ``` since fedora 40 comes with all the dependencies we need, let's build and test in a container with the fedora:40 image. with, hopefully the better CDN of the docker, and more reliable mirrors of fedora repositories, and the package retrievial machinary built into fedora's package management tools, we should have a more resilient CI. please note, in this change, we also * install git before checkout the repo. the reason is that, unlike the github-hosted runner, the fedora:40 image does not have `git` installed, so we have to install it manually before using "actions/checkout" action. * install clang-tools-extra when building with C++ modules enabled, because cmake and clang depend on clang-scan-deps to analyze the dependencies in betweener of C++20 modules. * use static library in "dev" build mode. this is to work around the issue where seastar allocator causes infinite recursive call if seastar is compiled as a shared library. this only happens when the tree is compiled with newer glibc. see also scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 17, 2024
instead of using ubuntu:jammy and setup-cpp action for prepearing the building toolchain, use fedora:40 container for building and testing. after switching to the github workflow based CI, we've been seeing test failures due to networking issue: ``` Failed to install llvm via system package manager Error: Command failed with exit code 35: curl -LJO https://apt.llvm.org/llvm.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:04:19 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Broken pipe in connection to apt.llvm.org:443 ``` since fedora 40 comes with all the dependencies we need, let's build and test in a container with the fedora:40 image. with, hopefully the better CDN of the docker, and more reliable mirrors of fedora repositories, and the package retrievial machinary built into fedora's package management tools, we should have a more resilient CI. please note, in this change, we also * install git before checkout the repo. the reason is that, unlike the github-hosted runner, the fedora:40 image does not have `git` installed, so we have to install it manually before using "actions/checkout" action. * install clang-tools-extra when building with C++ modules enabled, because cmake and clang depend on clang-scan-deps to analyze the dependencies in betweener of C++20 modules. * use static library in "dev" build mode. this is to work around the issue where seastar allocator causes infinite recursive call if seastar is compiled as a shared library. this only happens when the tree is compiled with newer glibc. see also scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 17, 2024
instead of using ubuntu:jammy and setup-cpp action for prepearing the building toolchain, use fedora:40 container for building and testing. after switching to the github workflow based CI, we've been seeing test failures due to networking issue: ``` Failed to install llvm via system package manager Error: Command failed with exit code 35: curl -LJO https://apt.llvm.org/llvm.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:04:19 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Broken pipe in connection to apt.llvm.org:443 ``` since fedora 40 comes with all the dependencies we need, let's build and test in a container with the fedora:40 image. with, hopefully the better CDN of the docker, and more reliable mirrors of fedora repositories, and the package retrievial machinary built into fedora's package management tools, we should have a more resilient CI. please note, in this change, we also * install git before checkout the repo. the reason is that, unlike the github-hosted runner, the fedora:40 image does not have `git` installed, so we have to install it manually before using "actions/checkout" action. * install clang-tools-extra when building with C++ modules enabled, because cmake and clang depend on clang-scan-deps to analyze the dependencies in betweener of C++20 modules. * use static library in "dev" build mode. this is to work around the issue where seastar allocator causes infinite recursive call if seastar is compiled as a shared library. this only happens when the tree is compiled with newer glibc. see also scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 20, 2024
quote from https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ > On the glibc side, we should recommend that intercepting mallocs and its > dependencies use initial-exec TLS because that kind of TLS does not use > malloc. If intercepting mallocs using dynamic TLS work at all, that's > totally by accident, and was in the past helped by glibc bug 19924. so instead of allocating TLS variables using malloc, let's allocate them using initial-exec TLS model. another approach is to single out the static TLS variables in the code path of malloc/free and apply `__attribute__ ((tls_model("initial-exec")))` to them, and optionally only do this when we are building shared library. but this could be overkill as 1. we build static library in the release build 2. the total size of the static TLS variables is presumably small, so the application linking against the seastar shared library should be able to afford this. see also https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ and https://sourceware.org/bugzilla/show_bug.cgi?id=19924 Fixes scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 20, 2024
quote from https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ > On the glibc side, we should recommend that intercepting mallocs and its > dependencies use initial-exec TLS because that kind of TLS does not use > malloc. If intercepting mallocs using dynamic TLS work at all, that's > totally by accident, and was in the past helped by glibc bug 19924. so instead of allocating TLS variables using malloc, let's allocate them using initial-exec TLS model. another approach is to single out the static TLS variables in the code path of malloc/free and apply `__attribute__ ((tls_model("initial-exec")))` to them, and optionally only do this when we are building shared library. but this could be overkill as 1. we build static library in the release build 2. the total size of the static TLS variables is presumably small, so the application linking against the seastar shared library should be able to afford this. see also https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ and https://sourceware.org/bugzilla/show_bug.cgi?id=19924 Fixes scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 20, 2024
instead of using ubuntu:jammy and setup-cpp action for prepearing the building toolchain, use fedora:40 container for building and testing. after switching to the github workflow based CI, we've been seeing test failures due to networking issue: ``` Failed to install llvm via system package manager Error: Command failed with exit code 35: curl -LJO https://apt.llvm.org/llvm.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:04:19 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Broken pipe in connection to apt.llvm.org:443 ``` since fedora 40 comes with all the dependencies we need, let's build and test in a container with the fedora:40 image. with, hopefully the better CDN of the docker, and more reliable mirrors of fedora repositories, and the package retrievial machinary built into fedora's package management tools, we should have a more resilient CI. please note, in this change, we also * install git before checkout the repo. the reason is that, unlike the github-hosted runner, the fedora:40 image does not have `git` installed, so we have to install it manually before using "actions/checkout" action. * install clang-tools-extra when building with C++ modules enabled, because cmake and clang depend on clang-scan-deps to analyze the dependencies in betweener of C++20 modules. * use static library in "dev" build mode. this is to work around the issue where seastar allocator causes infinite recursive call if seastar is compiled as a shared library. this only happens when the tree is compiled with newer glibc. see also scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
avikivity
pushed a commit
that referenced
this issue
May 20, 2024
quote from https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ > On the glibc side, we should recommend that intercepting mallocs and its > dependencies use initial-exec TLS because that kind of TLS does not use > malloc. If intercepting mallocs using dynamic TLS work at all, that's > totally by accident, and was in the past helped by glibc bug 19924. so instead of allocating TLS variables using malloc, let's allocate them using initial-exec TLS model. another approach is to single out the static TLS variables in the code path of malloc/free and apply `__attribute__ ((tls_model("initial-exec")))` to them, and optionally only do this when we are building shared library. but this could be overkill as 1. we build static library in the release build 2. the total size of the static TLS variables is presumably small, so the application linking against the seastar shared library should be able to afford this. see also https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ and https://sourceware.org/bugzilla/show_bug.cgi?id=19924 Fixes #2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov
added a commit
to tchaikov/seastar
that referenced
this issue
May 20, 2024
instead of using ubuntu:jammy and setup-cpp action for prepearing the building toolchain, use fedora:40 container for building and testing. after switching to the github workflow based CI, we've been seeing test failures due to networking issue: ``` Failed to install llvm via system package manager Error: Command failed with exit code 35: curl -LJO https://apt.llvm.org/llvm.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:04:19 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Broken pipe in connection to apt.llvm.org:443 ``` since fedora 40 comes with all the dependencies we need, let's build and test in a container with the fedora:40 image. with, hopefully the better CDN of the docker, and more reliable mirrors of fedora repositories, and the package retrievial machinary built into fedora's package management tools, we should have a more resilient CI. please note, in this change, we also * install git before checkout the repo. the reason is that, unlike the github-hosted runner, the fedora:40 image does not have `git` installed, so we have to install it manually before using "actions/checkout" action. * install clang-tools-extra when building with C++ modules enabled, because cmake and clang depend on clang-scan-deps to analyze the dependencies in betweener of C++20 modules. * use static library in "dev" build mode. this is to work around the issue where seastar allocator causes infinite recursive call if seastar is compiled as a shared library. this only happens when the tree is compiled with newer glibc. see also scylladb#2247 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
this can be reproduced with:
on fedora 40.
the backtrace looks like:
it looks like an issue in glibc's TLS support. and it seems that there is a patch trying to address this, see https://patchwork.ozlabs.org/project/glibc/patch/8734v1ieke.fsf@oldenburg.str.redhat.com/ . but at the moment of writing, the patch has not landed on upstream's master branch (https://sourceware.org/git/?p=glibc.git).
The text was updated successfully, but these errors were encountered: