-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/3 not working on OpenBSD -current arm64 #2569
Comments
is that possible to test on OpenBSD + OpenSSL ? it can help to identify whether LibreSSL is under suspiction or not |
It is related to LibreSSL but I currently have no idea how. This setup with haproxy on OpenBSD/arm64 compiled against quictls works as expected. With LibreSSL the quic handshake appears to hang at the finished stage, with haproxy waiting for Interestingly, quic with nginx works just fine on the same arm64 machine. Here's the diff I used for compiling the haproxy port against quictls: Index: Makefile
===================================================================
RCS file: /cvs/ports/net/haproxy/Makefile,v
diff -u -p -r1.113 Makefile
--- Makefile 5 May 2024 17:09:06 -0000 1.113
+++ Makefile 16 May 2024 23:07:40 -0000
@@ -8,7 +8,7 @@ MAINTAINER = Daniel Jakots <obsd@chown.m
# GPLv2
PERMIT_PACKAGE = Yes
-WANTLIB += c crypto pcre2-8 pcre2-posix pthread ssl z
+WANTLIB += c lib/qopenssl31/crypto pcre2-8 pcre2-posix pthread lib/qopenssl31/ssl z
DEBUG_PACKAGES = ${BUILD_PACKAGES}
@@ -26,13 +26,17 @@ MAKE_FLAGS += CPU_CFLAGS="${CFLAGS}" LDF
MAKE_FLAGS += CC="${CC}" LD="${CC}" TARGET="openbsd"
MAKE_FLAGS += USE_OPENSSL=1 USE_PCRE2=1 USE_QUIC=1 USE_ZLIB=1 V=1
MAKE_FLAGS += USE_LIBATOMIC=
+MAKE_FLAGS += SSL_INC=/usr/local/include/qopenssl31/
+MAKE_FLAGS += SSL_LIB=/usr/local/lib/qopenssl31/
+LDFLAGS+= "-Wl,-rpath,/usr/local/lib/qopenssl31/"
FAKE_FLAGS += DOCDIR="${PREFIX}/share/doc/haproxy"
FAKE_FLAGS += MANDIR="${PREFIX}/man"
COMPILER = base-clang ports-gcc
-LIB_DEPENDS = devel/pcre2
+LIB_DEPENDS = devel/pcre2 \
+ security/openssl/quictls
# Fix undefined reference to __atomic_*
.if ${MACHINE_ARCH} == "hppa" |
It could be useful to have haproxy traces to have more information. The simplest solution is to run haproxy binary with the extra arg |
I could reproduce it on latest master on openbsd-mips64. The connection establishes and nothing happens once the request is sent. I will retry with the traces. I try to avoid compiling too much on this machine, it's super slow, so I'll try to focus on specific tests :-) |
Here comes the trace for a "curl --http3 https://ip:port/" sent from a second machine. My bind line has an explicit address. |
BTW I'm on OpenBSD 7.5, and I noticed that when running curl from the local machine on 127.0.0.1, it quickly spits some "connection refused" despite the QUIC traces showing some communication. I'm not seeing this from another curl running on a different machine (and built on top of quictls-1.1.1). |
The requested traces for amd64 (LibreSSL only) and arm64 (LibreSSL and OpenSSL) (thanks for the diff Theo). This is with 3.0-dev11 btw. https://gist.github.com/lgv5/a778b4ca0b98582d036e52f781689a17 |
Haproxy did not manage to decipher the client handshake level packets but the header protections were correctly removed I guess, because the packet number start from 0. Even if non mandatory, when the header protection does not work, there are big chances the packet is not nul. The client could decipher the haproxy handshake level packets, if not, it would not send handshake level packets. I cannot say more without inspecting a capture (with keylog to retrieve the secrets). If wireshark has the same problem to decipher the client handshake, the issue has big chances to be on the client side.
|
I do have this option on curl but with the values above it gives the same result. Do you have any suggestion about which cipher to use ? Or what info I could provide you with ? Otherwise never mind, we can see this on tuesday, you can even have access to the machine to experiment with any idea. |
And to be complete, testing any of these ciphers individually doesn't seem to change anything. |
Any of them, except the last one in the list if I remember well (too weak) which is often disables by the TLS stack. |
What can be experienced is to decipher the ciphered packets into a debug buffer to check that haproxy can decipher the packet it has ciphered on this platform with this patch. |
@haproxyFred running with your patch on top of 3.0-dev11
I did a
|
@lgv5 thank you! This is interesting! In fact the patch exhibits the fact that haproxy cannot decipher the packets it has ciphered. So, the issue is on haproxy+TLS stack side. This makes me think there are AEAD cryptographic tests provided by the libressl sources. I do not know if you would be inclined to run these tests. If yes, have a look to tests/aeadtest.sh into the libressl sources directory. It needs tests/aeadtest.c to be compiled. |
from tests directory you can run the AEAD tests as follows:
|
Run the tests off from the src tree, not LibreSSL portable.
|
FWIW I've rebuilt on linux against libressl-3.9.2 and there it does work fine. In parallel I'm building on openbsd with openssl-1.1.1 to compare. Maybe we'll find that lib is not relevant to the issue and that only the OS is (e.g. a different behavior of a syscall, etc). |
botovq said there is no issue with OpenBSD+quictls. |
It would be interesting to add this section at the head of aes_256_gcm_tests.txt file and to test it again as follows:
from tests libressl sources directory.
|
Should have mentioned that this macro constant must be increased to 2048 in aeadtest.c:
|
It passes on my m1 once I bump BUF_MAX
|
@haproxyFred I tried with
prints
meaning that both AES-{128,256}-GCM suites work. In particular,
cc @botovq . |
OMG! So, we have used the wrong curl option :-s. |
I'm embarrassed. My mips64 build with openssl 1.1.1 completed and works fine. Perfect. I retried with the binary built last week with libressl, and it now works equally... I don't understand anything anymore about this, so I'll stop polluting the issue until I get more exploitable info. |
It's worse, the issue is |
I have managed to reproduce the same issue (with pkt_decipher.txt) patch which makes haproxy BUG_ON() as soon as it cannot decipher a packet it has ciphered. But only with libressl as TLS stack. |
and only with TLS_CHACHA20_POLY1305_SHA256 (on linux). |
we have exhibit the fact that haproxy+libressl cannot decipher its own chacha20-poly1305 packets. This is the case also on linux. |
That said, the issue arrives on the first packet. We do not reuse any context in this case... 🤔 |
on arm64 linux ? |
OK I was worried we were facing two distinct issues, but now I agree they are the same, as I could redo the test proposed above with each
That's with libressl-3.9. With openssl the first 3 work, and TLS_AES_128_CCM_SHA256 fails yelling this on stderr:
I seem to remember you once told us that in compat mode, one algo was not implemented, so I guess it might be that one. |
on amd64 too |
For me on linux, everything works fine with libressl-3.9.2, I cannot reproduce the issue. |
on arm64 linux ?
arm64 is a red herring.
In presence of AES-NI, LibreSSL prefers TLS_AES_256_GCM_SHA384 over
TLS_CHACHA20_POLY1305_SHA256, otherwise it prefers the latter.
That's why on non-ancient amd64 hardware the bug isn't visible whereas
it is visible on arm64 and mips64, where libressl has no native support
for AES, so it chooses chacha20-poly1305 as preferred cipher suite.
|
argh, I have only a 3.9.0 libressl version... :s |
Ah wait a minute, the curl version and/or lib counts as well! On linux I'm using curl-7.88.1 built against QuicTLS 1.1.1t. On the OpenBSD client, I'm using curl-8.6.0 built with LibreSSL-3.9.0. What I can say now is the following:
So the matrix is a bit curious as it involves both the client's TLS lib, the OS and the server's TLS lib. |
In case that helps, the linux gdb trace for the CCM crash is more exploitable. qc->ael is NULL:
|
@wtarreau Ok thank you for the backtrace. That said, I reproduce the same crash with haproxy+libressl.3.9.0 (and without libressl client). So with aes_128_ccm the TLS stacks usually emit a TLS alert and SSL_do_handshake() returns an error. This is not the case with libressl-3.9.0 which only emit an alert. Working on a patch to prevent the crash. There are chances that openbsd works well with libressl-3.9.2. |
@wtarreau |
I don't think it will. I am pretty sure this is a bug in libressl's quic support: The quic alert sending mechanism returns I've notified @4a6f656c (who is quite busy these days). |
Is The same program works in Alpine Linux 3.19 with both ciphers under |
@lgv5 well, that seems to explain it... Thanks! |
@lgv5 this diff seems to fix the issue for me in some light testing: Index: evp/e_chacha20poly1305.c
===================================================================
RCS file: /cvs/src/lib/libcrypto/evp/e_chacha20poly1305.c,v
diff -u -p -r1.35 e_chacha20poly1305.c
--- evp/e_chacha20poly1305.c 9 Apr 2024 13:52:41 -0000 1.35
+++ evp/e_chacha20poly1305.c 22 May 2024 10:56:46 -0000
@@ -496,14 +496,19 @@ chacha20_poly1305_cipher(EVP_CIPHER_CTX
if (out == NULL) {
cpx->ad_len += len;
cpx->in_ad = 1;
- } else {
+ CRYPTO_poly1305_update(&cpx->poly1305, in, len);
+
+ return len;
+ }
+ if (ctx->encrypt) {
ChaCha(&cpx->chacha, out, in, len);
cpx->in_len += len;
- }
- if (ctx->encrypt && out != NULL)
CRYPTO_poly1305_update(&cpx->poly1305, out, len);
- else
+ } else {
CRYPTO_poly1305_update(&cpx->poly1305, in, len);
+ ChaCha(&cpx->chacha, out, in, len);
+ cpx->in_len += len;
+ }
return len;
} |
@botovq I could see that the issue was with Poly1305 (the decryption before |
@lgv5 thanks for testing. Yes, the only change is that it reverses the order of the mac and cipher in the decryption case for precisely the reason you state. I'm not sure how often haproxy or curl trigger a rekey and 5G is slightly below the integrity limit. There was at least one key update at the end of the handshake. |
It's great that you've found it! Theo, as we're releasing 3.0 in 1 week, I'd like to temporarily make connections using CHACHA fail on LibreSSL-3.9.x so that the clients fall back to TCP. We'll later refine this to exactly the affected range once a new release containing your fix is emitted. I'm well aware that it's not the prettiest (I even thought about refusing to build with QUIC on that range), but we definitely don't want to leave users with a half-working config that makes their visitors in front of a site that does not respond. We could confirm here that this fix addresses it:
and that's likely what I'll merge, referencing this issue. However, if you have any better idea about a way (or even a trick) to forcefully disable CHACHA on the affected version range so that the code automatically falls back to the other algos, I'm obviously interested, as it will be more graceful to users! Thanks! |
At least 3.9.0 version of libressl TLS stack does not behave as others stacks like quictls which make SSL_do_handshake() return an error when no cipher could be negotiated in addition to emit a TLS alert(0x28). This is the case when TLS_AES_128_CCM_SHA256 is forced as TLS1.3 cipher from the client side. This make haproxy enter a code path which leads to a crash as follows: [Switching to Thread 0x7ffff76b9640 (LWP 23902)] 0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910 910 struct quic_kp_trace kp_trace = { (gdb) list 905 { 906 struct quic_tls_ctx *tls_ctx = &qc->ael->tls_ctx; 907 struct quic_tls_secrets *rx = &tls_ctx->rx; 908 struct quic_tls_secrets *tx = &tls_ctx->tx; 909 /* Used only for the traces */ 910 struct quic_kp_trace kp_trace = { 911 .rx_sec = rx->secret, 912 .rx_seclen = rx->secretlen, 913 .tx_sec = tx->secret, 914 .tx_seclen = tx->secretlen, (gdb) p qc $1 = (struct quic_conn *) 0x7ffff00371f0 (gdb) p qc->ael $2 = (struct quic_enc_level *) 0x0 (gdb) bt #0 0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910 #1 0x000000000049bca9 in qc_ssl_provide_quic_data (len=268, data=<optimized out>, ctx=0x7ffff0047f80, level=<optimized out>, ncbuf=<optimized out>) at src/quic_ssl.c:617 #2 qc_ssl_provide_all_quic_data (qc=qc@entry=0x7ffff00371f0, ctx=0x7ffff0047f80) at src/quic_ssl.c:688 #3 0x00000000004683a7 in quic_conn_io_cb (t=0x7ffff0047f30, context=0x7ffff00371f0, state=<optimized out>) at src/quic_conn.c:760 #4 0x000000000063cd9c in run_tasks_from_lists (budgets=budgets@entry=0x7ffff76961f0) at src/task.c:596 #5 0x000000000063d934 in process_runnable_tasks () at src/task.c:876 #6 0x0000000000600508 in run_poll_loop () at src/haproxy.c:3073 #7 0x0000000000600b67 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3287 #8 0x00007ffff7f6ae45 in start_thread () from /lib64/libpthread.so.0 #9 0x00007ffff78254af in clone () from /lib64/libc.so.6 When a TLS alert is emitted, haproxy calls quic_set_connection_close() which sets QUIC_FL_CONN_IMMEDIATE_CLOSE connection flag. This is this flag which is tested by this patch to make the handshake fail even if SSL_do_handshake() does not return an error. This test is specific to libressl and never run with others TLS stack. Thank you to @lgv5 and @botovq for having reported this issue in GH #2569. Must be backported as far as 2.6.
@wtarreau Thanks! This problem isn't new. It's been present ever since we added EVP support for chacha20-poly1305 in libressl 3.6.0. I think it was hidden until 97c344d landed almost exactly a year ago. Or did haproxy switch to in-place encryption only recently? I don't have a better idea than disabling it. I would recommend disabling it for all LibreSSL versions until the next release, though. I'll make sure that the OpenBSD port will keep CHACHA enabled in -current (I'll land the fix soon) and will also disable it in -stable. That is, I would do this: -#if !defined(OPENSSL_IS_AWSLC)
+#if !defined(OPENSSL_IS_AWSLC) && (!defined(LIBRESSL_VERSION_NUMBER) || LIBRESSL_VERSION_NUMBER >= 0x4000000fL) |
Initially I thought that version 3.6 did work, but I only tested it on x86 and given your comment about aes-ni forcing AES-GCM first it makes sense that it did work by default and has only hidden the problem! I'm perfectly fine with your proposed adjustment, I trust you that next version will be fixed. I'll reference this issue there so that we can recheck in case of doubt. Many thanks for your help! |
For the packet deciphering, haproxy has always used the in-place method (quic_tls_decrypt()). quic_tls_decrypt2() does the same thing as quic_tls_decrypt() except it does not decipher in place. But only used to check a Retry token. |
…0_POLY1305 As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's CHACHA20 in-place implementation that makes haproxy discard incoming QUIC packets encrypted with it. It's not very easy to observe the issue because: - QUIC recommends that CHACHA20 is used in priority - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance reasons, so the problem is only observed there if a client explicitly forces TLS_CHACHA20_POLY1305_SHA256 only. - discarded packets cause retransmits showing some apparent activity, and the handshake succeeds so it's not easy to analyze from the client which thinks that the server is slow to respond. Thus in practice, on non-x86 machines running LibreSSL, requests made over QUIC freeze for a long time, unless the client explicitly forces algos excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by default on modern OpenBSD systems, and was reported in the issue above for an arm64 machine running OpenBSD -current, and was also observed on a mips64 one running OpenBSD 7.5. There is no simple solution to this problem due to some of the protocol's constraints without digging too low into the stack (and risking to break more). Here we're taking a pragmatic approach consisting in making the connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected, regardless of the availability of other ciphers. This means that every time a connection would have hung, instead it will fail fast, allowing the client to retry over TLS/TCP. Theo Buehler recommends that we limit this protection to all LibreSSL versions before 4.0 since it's where the fix will be implemented. Older stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled, which should be sufficient to make QUIC work there again as well. The following config is sufficient to reproduce the issue (on a non-x86 machine, both arm64 & mips64 were confirmed to reproduce it): global limited-quic frontend stats mode http #bind :8181 #bind :8443 ssl crt rsa+dh2048.pem bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / And the following commands will trigger the problem on affected LibreSSL versions: curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl -v --http3 -k https://127.0.0.1:8443/ while these ones must work: curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/ Normally all of them will work with LibreSSL 4, and only the first one should fail with stable LibreSSL versions higher than 3.9.2. An haproxy version without this workaround will show an unresponsive command after the GET is sent, while a version with the workaround will close the connection on error. On a version with this workaround, if TCP listeners are uncommented, curl will automatically fall back to TCP and attempt the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode (hence the limited-quic option above) all of them must work. Many thanks to github user @lgv5 for the detailed report, tests, and for spotting the issue, and to @botovq (Theo Buehler) for the quick analysis, patch and help on this workaround. This needs to be backported to versions 2.6 and above.
Many thanks again for your help, guys, we've pushed the two patches so we're good regarding the forthcoming release. We'll backport them to stable versions as well. |
Take the MAC before clobbering the input value on decryption. Fixes hangs during the QUIC handshake with HAProxy using TLS_CHACHA20_POLY1305_SHA256. Found, issue pinpointed, and initial fix tested by Lucas Gabriel Vuotto: Let me take this opportunity to thank the HAProxy team for going out of their way to keep supporting LibreSSL. It's much appreciated. See haproxy/haproxy#2569 tweak/ok jsing
Thank you all for the help with tracking this down. I committed the fix for chacha to openbsd-current, disabled chacha20-poly1305 in the haproxy port in stable and we will also land a fix for the alert issue hopefully soon. |
Take the MAC before clobbering the input value on decryption. Fixes hangs during the QUIC handshake with HAProxy using TLS_CHACHA20_POLY1305_SHA256. Found, issue pinpointed, and initial fix tested by Lucas Gabriel Vuotto: Let me take this opportunity to thank the HAProxy team for going out of their way to keep supporting LibreSSL. It's much appreciated. See haproxy/haproxy#2569 tweak/ok jsing
From my side, the issue is addressed and resolved. I see the rest of the open issues have tons of tags and different status tracking information, so I feel reluctant to just close it. Thanks everyone! |
Means the HAProxy team will keep it open until they backported and released versions where it should go, so your reluctance is correct 👍 |
Detailed Description of the Problem
As the title reads, I can't make HTTP/3 work on OpenBSD -current arm64. From the client side,
curl
eventually returnsThe same configuration works in an equally up-to-date OpenBSD -current amd64. Information on how to further debug this is more than welcome.
Expected Behavior
HTTP/3 support works the same in arm64 and amd64.
Steps to Reproduce the Behavior
darkhttpd . --port 18080
. Also happens with OpenBSD'shttpd
and with Python'shttp.server
module./tmp/haproxy/certs
and set permissionsgo-rwx
for the key. For the record, the issue is present even with Buypass issued certs.haproxy -d -f haproxy.cfg
curl -ik --http3-only -H "Host: haproxy.invalid" https://127.0.0.1:10443/
Do you have any idea what may have caused this?
No.
Do you have an idea how to solve the issue?
No.
What is your configuration?
Output of
haproxy -vv
Last Outputs and Backtraces
Additional Information
I detected it not working on arm64 at least as far as mid January, https://marc.info/?l=openbsd-ports&m=170535379226660&w=2 for the details then. I think the port was at 2.8.3 at that time. I tried a bunch of other 2.8.x and 3.0-dev without luck.
Given I'll also be sharing this with OpenBSD folks, dmesgs are in
The text was updated successfully, but these errors were encountered: