Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/3 not working on OpenBSD -current arm64 #2569

Open
lgv5 opened this issue May 16, 2024 · 59 comments
Open

HTTP/3 not working on OpenBSD -current arm64 #2569

lgv5 opened this issue May 16, 2024 · 59 comments
Labels
status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug.

Comments

@lgv5
Copy link

lgv5 commented May 16, 2024

Detailed Description of the Problem

As the title reads, I can't make HTTP/3 work on OpenBSD -current arm64. From the client side, curl eventually returns

curl: (55) ngtcp2_conn_writev_stream returned error: ERR_DRAINING

The same configuration works in an equally up-to-date OpenBSD -current amd64. Information on how to further debug this is more than welcome.

Expected Behavior

HTTP/3 support works the same in arm64 and amd64.

Steps to Reproduce the Behavior

  1. run darkhttpd . --port 18080. Also happens with OpenBSD's httpd and with Python's http.server module.
  2. if certs are required, save the one in the bottom of this message to /tmp/haproxy/certs and set permissions go-rwx for the key. For the record, the issue is present even with Buypass issued certs.
  3. with the provided config, run haproxy -d -f haproxy.cfg
  4. run curl -ik --http3-only -H "Host: haproxy.invalid" https://127.0.0.1:10443/
  5. amd64 replies, arm64 doesn't
# haproxy.invalid.crt
-----BEGIN CERTIFICATE-----
MIIBYDCB5wIJAL9gFPC99rzvMAoGCCqGSM49BAMCMBoxGDAWBgNVBAMMD2hhcHJv
eHkuaW52YWxpZDAeFw0yNDA1MTYxOTU5NTFaFw0yNDA2MTUxOTU5NTFaMBoxGDAW
BgNVBAMMD2hhcHJveHkuaW52YWxpZDB2MBAGByqGSM49AgEGBSuBBAAiA2IABM8F
kWNmy37yDsoZJ8OBwxGdApUJp7MQgiH9WHD541M2tqrtMCSqwZpXHQ5U4UtBlgzB
aTWz43GZ0fPoteySTR0aFIxU1zRc1DsmUQuQhsVard4/AkmR7PYOm031ewCTzjAK
BggqhkjOPQQDAgNoADBlAjAUDaDUjZaBKuEM5WNjuJitiiItJ1u0rXBU1iQsyZSN
p5ZeILR1hoxLhkXwCIOK5iMCMQCfCj9t8Pya9uzzVv4kXhbxNHiAWQy8OBxuq17q
SYIIH5uBBxtv7z37bLuZkWyYwbA=
-----END CERTIFICATE-----
# haproxy.invalid.key
-----BEGIN EC PARAMETERS-----
BgUrgQQAIg==
-----END EC PARAMETERS-----
-----BEGIN EC PRIVATE KEY-----
MIGkAgEBBDDXwXwhRRm6le78CPtcdlbBocpNycLUkvZ0+pUWFGDDhFUA9Uq5TIwK
4EoSjLG85+qgBwYFK4EEACKhZANiAATPBZFjZst+8g7KGSfDgcMRnQKVCaezEIIh
/Vhw+eNTNraq7TAkqsGaVx0OVOFLQZYMwWk1s+NxmdHz6LXskk0dGhSMVNc0XNQ7
JlELkIbFWq3ePwJJkez2DptN9XsAk84=
-----END EC PRIVATE KEY-----

Do you have any idea what may have caused this?

No.

Do you have an idea how to solve the issue?

No.

What is your configuration?

global
        ssl-load-extra-del-ext

defaults
        log global
        mode http
        option httplog
        option dontlognull
        option redispatch
        retries 3
        maxconn 2000
        timeout connect 5s
        timeout client 65s
        timeout server 5s

frontend haproxy
        bind ipv4@:10443,ipv6@:10443 ssl crt /tmp/haproxy/certs/
        bind quic4@:10443,quic6@:10443 ssl crt /tmp/haproxy/certs/
        default_backend darkhttpd

backend darkhttpd
        server s1 127.0.0.1:18080 check

Output of haproxy -vv

#
# OpenBSD arm64
#

HAProxy version 2.8.9-1842fd0 2024/04/05 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.9.html
Running on: OpenBSD 7.5 GENERIC.MP#31 arm64
Build options :
  TARGET  = openbsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -pipe -g -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv -Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_ZLIB=1 USE_LIBATOMIC= USE_QUIC=1 USE_PCRE2=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 -BACKTRACE +CLOSEFROM -CPU_AFFINITY -CRYPT_H -DEVICEATLAS -DL -ENGINE -EPOLL -EVPORTS +GETADDRINFO +KQUEUE -LIBATOMIC +LIBCRYPT -LINUX_CAP -LINUX_SPLICE -LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING -NETFILTER -NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 -PCRE2_JIT -PCRE_JIT +POLL -PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT -RT -SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD -TFO +THREAD -THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=1).
Built with OpenSSL version : LibreSSL 3.9.0
Running on OpenSSL version : LibreSSL 3.9.0
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with zlib version : 1.3.1.1-motley
Running on zlib version : 1.3.1.1-motley
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: SO_BINDANY
Built with PCRE2 version : 10.37 2021-05-26
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes
Built with clang compiler version 16.0.6

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : none

Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace


#
# OpenBSD amd64
#

HAProxy version 2.8.9-1842fd0 2024/04/05 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.9.html
Running on: OpenBSD 7.5 GENERIC.MP#59 amd64
Build options :
  TARGET  = openbsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -pipe -g -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv -Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_ZLIB=1 USE_LIBATOMIC= USE_QUIC=1 USE_PCRE2=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 -BACKTRACE +CLOSEFROM -CPU_AFFINITY -CRYPT_H -DEVICEATLAS -DL -ENGINE -EPOLL -EVPORTS +GETADDRINFO +KQUEUE -LIBATOMIC +LIBCRYPT -LINUX_CAP -LINUX_SPLICE -LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING -NETFILTER -NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 -PCRE2_JIT -PCRE_JIT +POLL -PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT -RT -SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD -TFO +THREAD -THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=1).
Built with OpenSSL version : LibreSSL 3.9.0
Running on OpenSSL version : LibreSSL 3.9.0
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with zlib version : 1.3.1.1-motley
Running on zlib version : 1.3.1.1-motley
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: SO_BINDANY
Built with PCRE2 version : 10.37 2021-05-26
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes
Built with clang compiler version 16.0.6

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : none

Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Last Outputs and Backtraces

haproxy produces no output in this case, sadly.

Additional Information

I detected it not working on arm64 at least as far as mid January, https://marc.info/?l=openbsd-ports&m=170535379226660&w=2 for the details then. I think the port was at 2.8.3 at that time. I tried a bunch of other 2.8.x and 3.0-dev without luck.

Given I'll also be sharing this with OpenBSD folks, dmesgs are in

@lgv5 lgv5 added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels May 16, 2024
@chipitsine
Copy link
Member

is that possible to test on OpenBSD + OpenSSL ? it can help to identify whether LibreSSL is under suspiction or not

@botovq
Copy link

botovq commented May 17, 2024

It is related to LibreSSL but I currently have no idea how. This setup with haproxy on OpenBSD/arm64 compiled against quictls works as expected. With LibreSSL the quic handshake appears to hang at the finished stage, with haproxy waiting for curl''s finished message to arrive and curl eventually giving up with the ERR_DRAINING message from ngtcp2.

Interestingly, quic with nginx works just fine on the same arm64 machine.

Here's the diff I used for compiling the haproxy port against quictls:

Index: Makefile
===================================================================
RCS file: /cvs/ports/net/haproxy/Makefile,v
diff -u -p -r1.113 Makefile
--- Makefile	5 May 2024 17:09:06 -0000	1.113
+++ Makefile	16 May 2024 23:07:40 -0000
@@ -8,7 +8,7 @@ MAINTAINER =	Daniel Jakots <obsd@chown.m
 # GPLv2
 PERMIT_PACKAGE =		Yes
 
-WANTLIB +=	c crypto pcre2-8 pcre2-posix pthread ssl z
+WANTLIB +=	c lib/qopenssl31/crypto pcre2-8 pcre2-posix pthread lib/qopenssl31/ssl z
 
 DEBUG_PACKAGES = ${BUILD_PACKAGES}
 
@@ -26,13 +26,17 @@ MAKE_FLAGS +=	CPU_CFLAGS="${CFLAGS}" LDF
 MAKE_FLAGS +=	CC="${CC}" LD="${CC}" TARGET="openbsd"
 MAKE_FLAGS +=	USE_OPENSSL=1 USE_PCRE2=1 USE_QUIC=1 USE_ZLIB=1 V=1
 MAKE_FLAGS +=	USE_LIBATOMIC=
+MAKE_FLAGS +=	SSL_INC=/usr/local/include/qopenssl31/
+MAKE_FLAGS +=	SSL_LIB=/usr/local/lib/qopenssl31/
+LDFLAGS+=	"-Wl,-rpath,/usr/local/lib/qopenssl31/"
 
 FAKE_FLAGS +=	DOCDIR="${PREFIX}/share/doc/haproxy"
 FAKE_FLAGS +=	MANDIR="${PREFIX}/man"
 
 COMPILER =	base-clang ports-gcc
 
-LIB_DEPENDS =	devel/pcre2
+LIB_DEPENDS =	devel/pcre2 \
+		security/openssl/quictls
 
 # Fix undefined reference to __atomic_*
 .if ${MACHINE_ARCH} == "hppa"

@a-denoyelle
Copy link
Contributor

It could be useful to have haproxy traces to have more information. The simplest solution is to run haproxy binary with the extra arg -dt quic:developer:clean.

@wtarreau
Copy link
Member

I could reproduce it on latest master on openbsd-mips64. The connection establishes and nothing happens once the request is sent.

I will retry with the traces. I try to avoid compiling too much on this machine, it's super slow, so I'll try to focus on specific tests :-)

@wtarreau
Copy link
Member

Here comes the trace for a "curl --http3 https://ip:port/" sent from a second machine. My bind line has an explicit address.
quic-trace.txt

@wtarreau
Copy link
Member

BTW I'm on OpenBSD 7.5, and I noticed that when running curl from the local machine on 127.0.0.1, it quickly spits some "connection refused" despite the QUIC traces showing some communication. I'm not seeing this from another curl running on a different machine (and built on top of quictls-1.1.1).

@lgv5
Copy link
Author

lgv5 commented May 17, 2024

The requested traces for amd64 (LibreSSL only) and arm64 (LibreSSL and OpenSSL) (thanks for the diff Theo). This is with 3.0-dev11 btw.

https://gist.github.com/lgv5/a778b4ca0b98582d036e52f781689a17

@haproxyFred
Copy link
Contributor

Haproxy did not manage to decipher the client handshake level packets but the header protections were correctly removed I guess, because the packet number start from 0. Even if non mandatory, when the header protection does not work, there are big chances the packet is not nul. The client could decipher the haproxy handshake level packets, if not, it would not send handshake level packets.

I cannot say more without inspecting a capture (with keylog to retrieve the secrets). If wireshark has the same problem to decipher the client handshake, the issue has big chances to be on the client side.
What can be tested is to change the cipher suite. I dont think this is possible with curl. With ngtcp2 this is possible with this option:

  --ciphers=<CIPHERS>
              Specify the cipher suite list to enable.
              Default: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_CCM_SHA256

@wtarreau
Copy link
Member

I do have this option on curl but with the values above it gives the same result. Do you have any suggestion about which cipher to use ? Or what info I could provide you with ? Otherwise never mind, we can see this on tuesday, you can even have access to the machine to experiment with any idea.

@wtarreau
Copy link
Member

And to be complete, testing any of these ciphers individually doesn't seem to change anything.

@haproxyFred
Copy link
Contributor

haproxyFred commented May 17, 2024

Do you have any suggestion about which cipher to use ?

Any of them, except the last one in the list if I remember well (too weak) which is often disables by the TLS stack.

@haproxyFred
Copy link
Contributor

What can be experienced is to decipher the ciphered packets into a debug buffer to check that haproxy can decipher the packet it has ciphered on this platform with this patch.

@haproxyFred
Copy link
Contributor

pkt_decipher.txt

@lgv5
Copy link
Author

lgv5 commented May 17, 2024

@haproxyFred running with your patch on top of 3.0-dev11

long? 1 pn_off=37
enc mask:96b9affcc0
dec mask:96b9affcc0 pnlen=1
pn: 0 vs 0
long? 1 pn_off=38
enc mask:50d6b8f0ee
dec mask:50d6b8f0ee pnlen=1
pn: 1 vs 1
long? 1 pn_off=37
enc mask:8561f834b3
dec mask:8561f834b3 pnlen=1
pn: 0 vs 0

FATAL: bug condition "!quic_tls_decrypt(byte0 + aad_len, pkt->len - aad_len, byte0, aad_len, tls_ctx->tx.ctx, tls_ctx->tx.aead, tls_ctx->tx.key, iv)" matched at src/quic_tx.c:2190
Illegal instruction (core dumped) 

I did a t a a bt full in the coredump in case it helps:

GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-openbsd7.5".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/ports/pobj/haproxy-3.0pre0/fake-aarch64/usr/local/sbin/haproxy...
[New process 175927]
Core was generated by `haproxy'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0000000a1b81d024 in qc_build_pkt (end=0xe2a0c54ee "", qel=0xec79ae540, tls_ctx=0xec79ae570, frms=0xe70574310, qc=0xec7998760, ver=0xa1ba2ddf8 <quic_versions+104>,
    dglen=<optimized out>, must_ack=<optimized out>, padding=1, probe=<optimized out>, cc=0, pos=<optimized out>, pkt_type=<optimized out>, err=<optimized out>)
    at src/quic_tx.c:2189
2189                    BUG_ON(!quic_tls_decrypt(byte0 + aad_len, pkt->len - aad_len,
(gdb) t a a bt full

Thread 1 (process 175927):
#0  0x0000000a1b81d024 in qc_build_pkt (end=0xe2a0c54ee "", qel=0xec79ae540, tls_ctx=0xec79ae570, frms=0xe70574310, qc=0xec7998760, ver=0xa1ba2ddf8 <quic_versions+104>, dglen=<optimized out>, must_ack=<optimized out>, padding=1, probe=<optimized out>, cc=0, pos=<optimized out>, pkt_type=<optimized out>, err=<optimized out>) at src/quic_tx.c:2189
        msg = <optimized out>
        buf = "\340\000\000\000\001\024\252'I\216\373>\364܋\210\a\374Wȿ\230\023E_\211\b<\261\347\033+\217\370GC\325\000\006\000B\203\b\000\000{\000y\000\071\000l\000\024GΝ\226\211\202\351\364\241$\255_\234\336\000П4O\216\002\020\326.\t\203H\224\356=\016jb\035m\017\b<\261\347\033+\217\370G\001\004\200\000u0\003\002H\000\004\004\200\031\276d\005\002\177\374\006\002\177\374\a\002\177\374\b\002@d\t\001\003\f\000\016\001\b\021\020\000\000\000\001\377\000\000\035\000\000\000\001k3C\317\000\020\000\005\000\003\002h3\v\000\001m\000\000\001i\000\001d0\202\001`0\201\347\002\t\000\277`\024\360\275\366\274\357\060\n\006\b*\206H\316"...
        pnb = <optimized out>
        sample = <optimized out>
        mask = "\205a\370\064\263"
        iv = "^\234\377\333\060\f\223\234", <incomplete sequence \365\215>
        tx_iv = 0xe70595a20 "^\234\377\333\060\f\223\234ƌ\365\215", '\337' <repeats 12 times>, "@\260^\326\016"
        byte0 = <optimized out>
        packet_number = <optimized out>
        tx_iv_sz = 12
        truncated_pn = <optimized out>
        i = <optimized out>
        ret_pkt = 0xec79796e0
        encrypt_failure = <optimized out>
        pkt = <optimized out>
        buf_pn = <optimized out>
        pn_len = <optimized out>
        first_byte = 0xe2a0c50c0 <incomplete sequence \345>
        pn = <optimized out>
        payload = 0x67c18ddf06 "\006"
        last_byte = <optimized out>
        aad_len = 38
        payload_len = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        msg = <optimized out>
        msg = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
        buf = <optimized out>
        mask = <optimized out>
        iv = <optimized out>
        tx_iv_sz = <optimized out>
        tx_iv = <optimized out>
        truncated_pn = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--c
        byte0 = <optimized out>
        pnb = <optimized out>
        sample = <optimized out>
        i = <optimized out>
        packet_number = <optimized out>
        pnlen = <optimized out>
        msg = <optimized out>
        msg = <optimized out>
        msg = <optimized out>
        __x = <optimized out>
        __x = <optimized out>
#1  qc_prep_pkts (qc=0xec7998760, buf=<optimized out>, qels=<optimized out>) at src/quic_tx.c:602
        err = <optimized out>
        pkt_type = <optimized out>
        cur_pkt = <optimized out>
        probe = <optimized out>
        must_ack = <optimized out>
        frms = 0xe70574310
        next_qel = 0x0
        ver = 0xa1ba2ddf8 <quic_versions+104>
        tls_ctx = 0xec79ae570
        next_frms = 0x0
        ret = <error reading variable ret (Cannot access memory at address 0xffffffffffffffff)>
        cc = 0
        padding = <optimized out>
        prv_pkt = 0xec7979140
        first_pkt = 0xec7979140
        pos = <optimized out>
        end = 0xe2a0c54ee ""
        dglen = <optimized out>
        total = <optimized out>
        qel = <optimized out>
        tmp_qel = <optimized out>
#2  0x0000000a1b81922c [PAC] in qc_send (qc=0xec7998760, old_data=0, send_list=0x67c18de6a0) at src/quic_tx.c:724
        status = <error reading variable status (Cannot access memory at address 0x0)>
        buf = 0xec7998c18
        ret = <optimized out>
        tmp_qel = <optimized out>
        qel = <optimized out>
#3  0x0000000a1b7da7c8 [PAC] in quic_conn_io_cb (t=0xe7057dc40, context=0xec7998760, state=<optimized out>) at src/quic_conn.c:809
        send_list = {n = 0xec79ae550, p = 0xec79ae550}
        qc = <optimized out>
        st = <optimized out>
        qel = <optimized out>
        tl = <optimized out>
#4  0x0000000a1b9859ac [PAC] in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:596
        _ = {func = 0xa1b728d05 "run_tasks_from_lists", file = 0xa1b7359b2 "src/task.c", line = 657, what = 6 '\006', arg8 = 0 '\000', arg32 = 0}
        queue = <optimized out>
        done = 0
        profile_entry = 0x0
        budget_mask = 15 '\017'
        tl_queues = 0xa1bc8c350 <ha_thread_ctx+208>
        ctx = 0xec7998760
        process = 0xa1b7d9db4 <quic_conn_io_cb>
        t = 0xe7057dc40
        state = 181
#5  0x0000000a1b986498 [PAC] in process_runnable_tasks () at src/task.c:876
        max = {0, 0, 0, 0}
        heavy_queued = <error reading variable heavy_queued (Cannot access memory at address 0x1)>
        default_weights = <error reading variable default_weights (Cannot access memory at address 0x40)>
        tt = 0xa1bc8c280 <ha_thread_ctx>
        max_processed = <optimized out>
        max_total = <optimized out>
        queue = <error reading variable queue (Cannot access memory at address 0x4)>
        budget = 0
        grq = <optimized out>
        lrq = <optimized out>
        gpicked = <optimized out>
        lpicked = <optimized out>
        t = <optimized out>
        tmp_list = <optimized out>
#6  0x0000000a1b93d230 [PAC] in run_poll_loop () at src/haproxy.c:3073
        _ = {func = 0xa1b709848 "run_poll_loop", file = 0xa1b7288cb "src/haproxy.c", line = 3104, what = 1 '\001', arg8 = 0 '\000', arg32 = 0}
        wake = <optimized out>
        next = <optimized out>
#7  0x0000000a1b941638 [PAC] in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3287
        init_left = 0
        init_mutex = 0xe70595de0
        init_cond = 0x0
        warn_fail = 0
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
#8  0x0000000a1b9408b4 [PAC] in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3989
        limit = {rlim_cur = 1024, rlim_max = 1024}
        pidfd = <optimized out>
        retry = <optimized out>
        err = <optimized out>
        intovf = <optimized out>

@haproxyFred
Copy link
Contributor

@lgv5 thank you! This is interesting! In fact the patch exhibits the fact that haproxy cannot decipher the packets it has ciphered. So, the issue is on haproxy+TLS stack side.

This makes me think there are AEAD cryptographic tests provided by the libressl sources. I do not know if you would be inclined to run these tests. If yes, have a look to tests/aeadtest.sh into the libressl sources directory. It needs tests/aeadtest.c to be compiled.

@haproxyFred
Copy link
Contributor

from tests directory you can run the AEAD tests as follows:

$ srcdir=. ./aeadtest.sh aeadtests.txt 
Completed 9 test cases
PASS
Completed 74 test cases
PASS
Completed 6 test cases
PASS
Completed 66 test cases
PASS
Completed 81 test cases
PASS
Completed 52 test cases
PASS

@lgv5
Copy link
Author

lgv5 commented May 18, 2024

Run the tests off from the src tree, not LibreSSL portable.

cc -O2 -pipe  -DLIBRESSL_INTERNAL -Werror -Wall -Wpointer-arith -Wuninitialized -Wstrict-prototypes -Wmissing-prototypes -Wunused -Wsign-compare -Wshadow  -MD -MP  -c aeadtest.c
cc   -o aeadtest aeadtest.o -Wl,-Bstatic -lcrypto -Wl,-Bdynamic
==== regress-aeadtest ====
./aeadtest aead /home/lucas/libcrypto/aead/aeadtests.txt
Completed 9 test cases
PASS
./aeadtest aes-128-gcm /home/lucas/libcrypto/aead/aes_128_gcm_tests.txt
Completed 74 test cases
PASS
./aeadtest aes-192-gcm /home/lucas/libcrypto/aead/aes_192_gcm_tests.txt
Completed 6 test cases
PASS
./aeadtest aes-256-gcm /home/lucas/libcrypto/aead/aes_256_gcm_tests.txt
Completed 66 test cases
PASS
./aeadtest chacha20-poly1305 /home/lucas/libcrypto/aead/chacha20_poly1305_tests.txt
Completed 81 test cases
PASS
./aeadtest xchacha20-poly1305 /home/lucas/libcrypto/aead/xchacha20_poly1305_tests.txt
Completed 52 test cases
PASS

@wtarreau
Copy link
Member

FWIW I've rebuilt on linux against libressl-3.9.2 and there it does work fine. In parallel I'm building on openbsd with openssl-1.1.1 to compare. Maybe we'll find that lib is not relevant to the issue and that only the OS is (e.g. a different behavior of a syscall, etc).

@haproxyFred
Copy link
Contributor

botovq said there is no issue with OpenBSD+quictls.

@haproxyFred
Copy link
Contributor

@lgv5

It would be interesting to add this section at the head of aes_256_gcm_tests.txt file and to test it again as follows:

./aeadtest aes-256-gcm ./aes_256_gcm_tests.txt

from tests libressl sources directory.

KEY: 8d28b9e04575fad59cb729a585d3a6f807cabc403c4656a7606d307b3322b90f
NONCE: 5237888730fbafa0347d912d
CT: 79a52f91516ee6d6ff388d9120e10df8d8c8dbd661c94a35e648f36e4565c993a3304c91dbb2735d1334be59b78edf29f76b7b6b19ff8781e83a55546d1f73d67bdae1b85074a870c13c7cb992f4776377ccd08ee1960c0f804cc5f587435d3b246f4e282bc5c7ba03aae0b5a1e3ac3e5e4dd3727c4312479f159da641354412598a7385250e38a3e20d75fbf8fc711bcb74c01dee5c3343db9b6a95695146394426cff9a3ca6add74ec182c74f1f5571ac9d1270bb758644e41f8ad34a0bbf2ba4e258578c322b336629d43c98f0e48e449d6e2da3317cc47785ff6515450eb654b4a8b38416c5c1b3773bb90edf63e4f4b273985799e4f7edf02d972a4ddcf3737f7ea2b626586f45a251ed636824a8d87f8a0d526e63028550ff5e80173363b82e6388220ebb875a355ae0ae2029cdb4cba238d024dc153e2831843e3c3aa2e75972f762fdac5342c79833433677639083f5c40ade6f90ab3a867d2ce466eca9d4cb4d8a3cf06ea3255fbcdb084411561493d475cc9358f8e9826441ff8faeb2bd1c2303c4bba7bb438b739915bef3e50595d9779391eaf33130875ec32451f9010ef46c1c7890a75e7e412d1b8f539c69e968a6575a5edc24c0e584f21a2f2af3c54310849e4a3728607df5e16686977593e5843cb23888197a358733e291296e2c17a37a4f8ad242bc512ffe3e39045f0d7b52d344b361e8138c4a59a7c864eecf8a421a08ace01bbe04838f23939f9a7dbd2c90838d92864c3d56d8ceeb7e7d7230d6ca10e0e06ace25c4ae8b897b8e378a7678d4a50a66512b3b512f95446c8639b708703a8efad31433e2876cb8c8f8c6f200f42f67ddf19ea783fb4a124901d31740918b9b04a71c40581d14fc3c2acf7110506dd59878454771c80a4493fd92ae7ec38b9686aca79176db3e0f14b5dc338a6cd041e0c3c0c2e32adef7b82d83476ac2130074d13b31e617b7dd3abc971def0a101e10bc8a693651816b827b1452bff61edc1076082f276cf4a6a5bbc1d82d826381629a3e7ea64e98ed3ca985d8581b2567535b33e9da05505ce333023e9c1cea8928a48843942b780ad264a5c24d9ecae8d8fa5e57debd4a9a91e5898054740417dea3c3d52259ea6b1f32ef1f5b9fa8dd3a65ff8582fc4011dccc8d755ba1664698055953c4b596d6e1c50e46df18392aa7a0e824b14f5789a38b2275fd05d3ef0d2b4a93bfe807caa3198fa4146213ba42ea1e362e49dc5a6b90ba7d3c67914e212a2fe0771a449014a53b3ad040385817f4990c4979ce5190bef716fd0cd120e7fcd743c4c70643ecc3ae702f82970bb223e5631d5fd630b34ec77b2fa712fe72cdb99cc9ce1f6ba250385a9cfbfd3fd5f9c3b7c3c83d9382473ccdcd9d690b1c5c93280971f70e80910709ee72ffb4e083316f7effc8cb42d0bd66c9330b5aed9269d229356cd7bd668a7fec774f0aa00c8baaad9331b3c1489a3fd5aec109b797eea8481
AD: e000000001119a7dac5bc4bd94da1d9a1add32505e1b1a08e57ab8f49f823f5d443000
IN: 0600441b08000083008100000000001000050003026833003900700012dbeab915ff5977a9b66bbe7ca6669e6688720210ecd077816f1f95f45f6dcc99997ba7fb0f08e57ab8f49f823f5d01048000753003024800040480337e64050480007ffc060480007ffc070480007ffc080240640901030c000e0108111000000001ff00001d000000016b3343cf0b0005810000057d000578308205743082035ca003020102021479d27db3468e5fcd5468209bda8600d57144a84b300d06092a864886f70d01010b05003062310b30090603550406130246523112301006035504080c094e6f726d616e646965310e300c06035504070c05526f75656e31173015060355040a0c0e6164656e6f79656c6c652e6e65743116301406035504030c0d6164656e6f79656c6c655f4341301e170d3233303132343133323333335a170d3234303132343133323333335a30653119301706035504030c10686f7374322e6164656e6f79656c6c65310b30090603550406130246523112301006035504080c094e6f726d616e646965310e300c06035504070c05526f75656e31173015060355040a0c0e6164656e6f79656c6c652e6e657430820222300d06092a864886f70d01010105000382020f003082020a0282020100c12763e8bbb2fbbbb19e3653de51782158fbdfa29fe971c647869e6b22812f43895d2eb4616df91847c803120c68e69b8d2f29d60301f7fa746b8e712dfea7f91e433c066fe1945ea52d6a879bf2d7d6b5da28d1c16fb8d8ce4ef3553f89dc25ded7c40030bae6f295e3a688a28fd693b2d27f56b13a6cb0d1b2ce5ab6fc6f717eccb0317ba9bf39c283d58d2d9ce847f333f77d9173a3152a2a67d5292c848072ce79be985123a482c62f1814e9fa0932f197c77054326fe38911928e4433cb4690c6b6b13ed65d2de57f41f12e0438cca713e9e776adbd2116a5063f5a2a5818c24dc7df75ec3d7055661305a573eea90bb2884d28998e9ec82f07f7326f9273510cfede0a2f55516b5bc74860165d47f7c7bbc68ecacf69f10d39683c3888cfe788c9ee9280f23fa1353279344f05ead90bc45e1ce9362b3c2ec0f931255710c499cd5baad5590a33069c9f4edb69739709f0d5e6fefa721acf1e6b99ff60c01ec3901781c9302c97ffc6e329252cb28f3ce9bae489a93a6a23b9301c1ff8762ca04d9e1299854bc9668560a6745c2d0b978fbd0dc3a2cd19329ff59f6a54788a519e3cbf76126a1b2674cfdbc0f8e5e8346ba450300416d02b220bb0e10bf0901c0a9bcb8194a40f4b58853012ece0801254e6a9e7d2fdd9e91507472ad03efc50fd4d0a67536d55dbca62d07fd419c1e51890a6caacd705b0cb94d657010203010001a31f301d301b0603551d11041430128210686f7374322e6164656e6f79656c6c65300d06092a864886f70d01010b05000382020100ad60bf0fa65966df9ca21e7829f6c07f05
TAG: 0d0b1a74e97db5731762d3690a292da2

@haproxyFred
Copy link
Contributor

Should have mentioned that this macro constant must be increased to 2048 in aeadtest.c:

#define BUF_MAX 2048

@botovq
Copy link

botovq commented May 21, 2024 via email

@lgv5
Copy link
Author

lgv5 commented May 21, 2024

@haproxyFred I tried with --tls13-ciphers instead of --ciphers option. The result is interesting:

for c in TLS_AES_128_GCM_SHA256 TLS_AES_256_GCM_SHA384 TLS_CHACHA20_POLY1305_SHA256 TLS_AES_128_CCM_SHA256; do
    curl --tls13-ciphers "$c" -m 2 --http3-only -k https://127.0.0.1:10443 || echo "FAIL! $c" >&2
done

prints

curl: (28) Operation timed out after 2018 milliseconds with 0 bytes received
FAIL! TLS_CHACHA20_POLY1305_SHA256
curl: (28) Connection timed out after 2013 milliseconds
FAIL! TLS_AES_128_CCM_SHA256

meaning that both AES-{128,256}-GCM suites work. In particular,

$ curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3-only -k https://127.0.0.1:10443/ 
*   Trying 127.0.0.1:10443...
* QUIC cipher selection: TLS_AES_256_GCM_SHA384
* Skipped certificate verification
* Connected to 127.0.0.1 (127.0.0.1) port 10443
* using HTTP/3
* [HTTP/3] [0] OPENED stream for https://127.0.0.1:10443/
* [HTTP/3] [0] [:method: GET]
* [HTTP/3] [0] [:scheme: https]
* [HTTP/3] [0] [:authority: 127.0.0.1:10443]
* [HTTP/3] [0] [:path: /]
* [HTTP/3] [0] [user-agent: curl/8.7.1]
* [HTTP/3] [0] [accept: */*]
> GET / HTTP/3
> Host: 127.0.0.1:10443
> User-Agent: curl/8.7.1
> Accept: */*
> 
* Request completely sent off
< HTTP/3 200 
< date: Tue, 21 May 2024 13:43:08 GMT
< server: darkhttpd/1.16
< accept-ranges: bytes
< content-length: 0
< content-type: text/html
< last-modified: Tue, 21 May 2024 13:35:43 GMT
< 
* Connection #0 to host 127.0.0.1 left intact

cc @botovq .

@haproxyFred
Copy link
Contributor

OMG! So, we have used the wrong curl option :-s.
So by default I guess this is TLS_CHACHA20_POLY1305_SHA256 or TLS_AES_128_CCM_SHA256 which is negotiated. I would say TLS_CHACHA20_POLY1305_SHA256 because the other is disabled by the TLS stack.

@wtarreau
Copy link
Member

I'm embarrassed. My mips64 build with openssl 1.1.1 completed and works fine. Perfect. I retried with the binary built last week with libressl, and it now works equally... I don't understand anything anymore about this, so I'll stop polluting the issue until I get more exploitable info.

@lgv5
Copy link
Author

lgv5 commented May 21, 2024

It's worse, the issue is TLS_CHACHA20_POLY1305_SHA256, which now fails in both amd64 and arm64 (wild guess: AES-GCM is preferred in amd64 because of AES-NI, making the issue not noticeable there until now.)

@haproxyFred
Copy link
Contributor

I have managed to reproduce the same issue (with pkt_decipher.txt) patch which makes haproxy BUG_ON() as soon as it cannot decipher a packet it has ciphered. But only with libressl as TLS stack.

@haproxyFred
Copy link
Contributor

and only with TLS_CHACHA20_POLY1305_SHA256 (on linux).

@haproxyFred
Copy link
Contributor

haproxyFred commented May 21, 2024

just curious, what if curl is running on another computer ?

we have exhibit the fact that haproxy+libressl cannot decipher its own chacha20-poly1305 packets. This is the case also on linux.

@haproxyFred
Copy link
Contributor

That said, the issue arrives on the first packet. We do not reuse any context in this case... 🤔

@chipitsine
Copy link
Member

just curious, what if curl is running on another computer ?

we have exhibit the fact that haproxy+libressl cannot decipher its own chacha20-poly1305 packets. This is the case also on linux.

on arm64 linux ?

@wtarreau
Copy link
Member

OK I was worried we were facing two distinct issues, but now I agree they are the same, as I could redo the test proposed above with each --tls13-ciphers value and I can confirm that the first two work (TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384), that TLS_CHACHA20_POLY1305_SHA256 fails, and that TLS_AES_128_CCM_SHA256 even crashes here:

Program received signal SIGSEGV, Segmentation fault.
quic_tls_key_update (qc=0xf669b53a0) at src/quic_tls.c:911
911                     .rx_sec = rx->secret,
Current language:  auto; currently minimal
(gdb) p rx

That's with libressl-3.9. With openssl the first 3 work, and TLS_AES_128_CCM_SHA256 fails yelling this on stderr:

conn. @0x0 OpenSSL error[0x142080b7] ssl_cache_cipherlist: no ciphers specified
conn. @0x0 OpenSSL error[0x142080b7] ssl_cache_cipherlist: no ciphers specified

I seem to remember you once told us that in compat mode, one algo was not implemented, so I guess it might be that one.

@haproxyFred
Copy link
Contributor

haproxyFred commented May 21, 2024

on arm64 linux ?

on amd64 too

@wtarreau
Copy link
Member

For me on linux, everything works fine with libressl-3.9.2, I cannot reproduce the issue.

@botovq
Copy link

botovq commented May 21, 2024 via email

@haproxyFred
Copy link
Contributor

For me on linux, everything works fine with libressl-3.9.2, I cannot reproduce the issue.

argh, I have only a 3.9.0 libressl version... :s

@wtarreau
Copy link
Member

Ah wait a minute, the curl version and/or lib counts as well! On linux I'm using curl-7.88.1 built against QuicTLS 1.1.1t. On the OpenBSD client, I'm using curl-8.6.0 built with LibreSSL-3.9.0.

What I can say now is the following:

  • openbsd curl to openbsd haproxy+libressl => fails when using TLS_CHACHA20_POLY1305_SHA256
  • openbsd curl to openbsd haproxy+libressl => works when using TLS_AES_128_GCM_SHA256 or TLS_AES_256_GCM_SHA384
  • openbsd curl to openbsd haproxy+libressl => crashes when using TLS_AES_128_CCM_SHA256
  • openbsd curl to openbsd haproxy+openssl => works with the first 3 algo
  • openbsd curl to openbsd haproxy+openssl => fails with TLS_AES_128_CCM_SHA256
  • openbsd curl to linux haproxy+libressl => fails when using TLS_CHACHA20_POLY1305_SHA256
  • openbsd curl to linux haproxy+libressl => works when using TLS_AES_128_GCM_SHA256 or TLS_AES_256_GCM_SHA384
  • openbsd curl to linux haproxy+libressl => crashes when using TLS_AES_128_CCM_SHA256
  • linux curl to openbsd haproxy+libressl => fails using any algo (again)
  • linux curl to linux haproxy+libressl => all works (even CCM) !

So the matrix is a bit curious as it involves both the client's TLS lib, the OS and the server's TLS lib.

@wtarreau
Copy link
Member

In case that helps, the linux gdb trace for the CCM crash is more exploitable. qc->ael is NULL:

[Switching to Thread 0x7ffff76b9640 (LWP 23902)]
0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
910             struct quic_kp_trace kp_trace = {
(gdb) list
905     {
906             struct quic_tls_ctx *tls_ctx = &qc->ael->tls_ctx;
907             struct quic_tls_secrets *rx = &tls_ctx->rx;
908             struct quic_tls_secrets *tx = &tls_ctx->tx;
909             /* Used only for the traces */
910             struct quic_kp_trace kp_trace = {
911                     .rx_sec = rx->secret,
912                     .rx_seclen = rx->secretlen,
913                     .tx_sec = tx->secret,
914                     .tx_seclen = tx->secretlen,
(gdb) p qc
$1 = (struct quic_conn *) 0x7ffff00371f0
(gdb) p qc->ael
$2 = (struct quic_enc_level *) 0x0
(gdb) bt
#0  0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
#1  0x000000000049bca9 in qc_ssl_provide_quic_data (len=268, data=<optimized out>, ctx=0x7ffff0047f80, level=<optimized out>, ncbuf=<optimized out>) at src/quic_ssl.c:617
#2  qc_ssl_provide_all_quic_data (qc=qc@entry=0x7ffff00371f0, ctx=0x7ffff0047f80) at src/quic_ssl.c:688
#3  0x00000000004683a7 in quic_conn_io_cb (t=0x7ffff0047f30, context=0x7ffff00371f0, state=<optimized out>) at src/quic_conn.c:760
#4  0x000000000063cd9c in run_tasks_from_lists (budgets=budgets@entry=0x7ffff76961f0) at src/task.c:596
#5  0x000000000063d934 in process_runnable_tasks () at src/task.c:876
#6  0x0000000000600508 in run_poll_loop () at src/haproxy.c:3073
#7  0x0000000000600b67 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3287
#8  0x00007ffff7f6ae45 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffff78254af in clone () from /lib64/libc.so.6

@haproxyFred
Copy link
Contributor

@wtarreau Ok thank you for the backtrace.

That said, I reproduce the same crash with haproxy+libressl.3.9.0 (and without libressl client). So with aes_128_ccm the TLS stacks usually emit a TLS alert and SSL_do_handshake() returns an error. This is not the case with libressl-3.9.0 which only emit an alert.

Working on a patch to prevent the crash.

There are chances that openbsd works well with libressl-3.9.2.

@haproxyFred
Copy link
Contributor

@wtarreau
I have just pushed a patch to fix this crash here: haproxytech/quic-dev@ff95d87

@botovq
Copy link

botovq commented May 22, 2024

There are chances that openbsd works well with libressl-3.9.2.

I don't think it will. I am pretty sure this is a bug in libressl's quic support:

https://github.com/openbsd/src/blob/69c9593d13407b20c0145bf08a4b9f8e8aae9319/lib/libssl/tls13_quic.c#L130-L142

The quic alert sending mechanism returns TLS13_IO_SUCCESS whereas the ordinary TLSv1.3 alert sending would usually return TLS13_IO_ALERT, which would then cause a handshake failure a few layers up:

https://github.com/openbsd/src/blob/69c9593d13407b20c0145bf08a4b9f8e8aae9319/lib/libssl/tls13_record_layer.c#L361

I've notified @4a6f656c (who is quite busy these days).

@lgv5
Copy link
Author

lgv5 commented May 22, 2024

Is quic_tls_decrypt in src/quic_tls.c the only decryption entrypoint? The inplace decryption doesn't work with LibreSSL and ChaCha20-Poly1305. https://gist.github.com/lgv5/1746093a53938a7c0c6151a445425cf3 shows the behaviour, replicating quic_tls_{en,de}crypt. In particular, replacing https://gist.github.com/lgv5/1746093a53938a7c0c6151a445425cf3#file-inplace-bug-c-L79 with EVP_aes_256_gcm() makes the program terminate correctly.

The same program works in Alpine Linux 3.19 with both ciphers under OpenSSL 3.1.5 30 Jan 2024 (Library: OpenSSL 3.1.5 30 Jan 2024).

@botovq
Copy link

botovq commented May 22, 2024

@lgv5 well, that seems to explain it... Thanks!

@botovq
Copy link

botovq commented May 22, 2024

@lgv5 this diff seems to fix the issue for me in some light testing:

Index: evp/e_chacha20poly1305.c
===================================================================
RCS file: /cvs/src/lib/libcrypto/evp/e_chacha20poly1305.c,v
diff -u -p -r1.35 e_chacha20poly1305.c
--- evp/e_chacha20poly1305.c	9 Apr 2024 13:52:41 -0000	1.35
+++ evp/e_chacha20poly1305.c	22 May 2024 10:56:46 -0000
@@ -496,14 +496,19 @@ chacha20_poly1305_cipher(EVP_CIPHER_CTX 
 		if (out == NULL) {
 			cpx->ad_len += len;
 			cpx->in_ad = 1;
-		} else {
+			CRYPTO_poly1305_update(&cpx->poly1305, in, len);
+
+			return len;
+		}
+		if (ctx->encrypt) {
 			ChaCha(&cpx->chacha, out, in, len);
 			cpx->in_len += len;
-		}
-		if (ctx->encrypt && out != NULL)
 			CRYPTO_poly1305_update(&cpx->poly1305, out, len);
-		else
+		} else {
 			CRYPTO_poly1305_update(&cpx->poly1305, in, len);
+			ChaCha(&cpx->chacha, out, in, len);
+			cpx->in_len += len;
+		}
 
 		return len;
 	}

@lgv5
Copy link
Author

lgv5 commented May 22, 2024

@botovq I could see that the issue was with Poly1305 (the decryption before EVP_DecryptFinal_ex is alright). The end result of the patch is that, for the out != NULL && !ctx->encrypt case, Poly1305 MAC is updated before decryption instead of after, which makes sense because in gets clobbered after the decryption. A 5GB (should be enought to even trigger a rekey, right?) download of zeros with haproxy and ChaCha20-Poly1305 agrees with it.

@botovq
Copy link

botovq commented May 22, 2024

@lgv5 thanks for testing. Yes, the only change is that it reverses the order of the mac and cipher in the decryption case for precisely the reason you state. I'm not sure how often haproxy or curl trigger a rekey and 5G is slightly below the integrity limit. There was at least one key update at the end of the handshake.

@wtarreau
Copy link
Member

It's great that you've found it! Theo, as we're releasing 3.0 in 1 week, I'd like to temporarily make connections using CHACHA fail on LibreSSL-3.9.x so that the clients fall back to TCP. We'll later refine this to exactly the affected range once a new release containing your fix is emitted. I'm well aware that it's not the prettiest (I even thought about refusing to build with QUIC on that range), but we definitely don't want to leave users with a half-working config that makes their visitors in front of a site that does not respond. We could confirm here that this fix addresses it:

diff --git a/include/haproxy/quic_tls.h b/include/haproxy/quic_tls.h
index 86b8c1ee32..a4b790075e 100644
--- a/include/haproxy/quic_tls.h
+++ b/include/haproxy/quic_tls.h
@@ -140,7 +140,7 @@ static inline const EVP_CIPHER *tls_aead(const SSL_CIPHER *cipher)
                return EVP_aes_128_gcm();
        case TLS1_3_CK_AES_256_GCM_SHA384:
                return EVP_aes_256_gcm();
-#if !defined(OPENSSL_IS_AWSLC)
+#if !defined(OPENSSL_IS_AWSLC)// && (!defined(LIBRESSL_VERSION_NUMBER) || LIBRESSL_VERSION_NUMBER < 0x3090000fL)
        case TLS1_3_CK_CHACHA20_POLY1305_SHA256:
                return EVP_chacha20_poly1305();
 #endif

and that's likely what I'll merge, referencing this issue. However, if you have any better idea about a way (or even a trick) to forcefully disable CHACHA on the affected version range so that the code automatically falls back to the other algos, I'm obviously interested, as it will be more graceful to users!

Thanks!

haproxy-mirror pushed a commit that referenced this issue May 22, 2024
At least 3.9.0 version of libressl TLS stack does not behave as others stacks like quictls which
make SSL_do_handshake() return an error when no cipher could be negotiated
in addition to emit a TLS alert(0x28). This is the case when TLS_AES_128_CCM_SHA256
is forced as TLS1.3 cipher from the client side. This make haproxy enter a code
path which leads to a crash as follows:

[Switching to Thread 0x7ffff76b9640 (LWP 23902)]
0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
910             struct quic_kp_trace kp_trace = {
(gdb) list
905     {
906             struct quic_tls_ctx *tls_ctx = &qc->ael->tls_ctx;
907             struct quic_tls_secrets *rx = &tls_ctx->rx;
908             struct quic_tls_secrets *tx = &tls_ctx->tx;
909             /* Used only for the traces */
910             struct quic_kp_trace kp_trace = {
911                     .rx_sec = rx->secret,
912                     .rx_seclen = rx->secretlen,
913                     .tx_sec = tx->secret,
914                     .tx_seclen = tx->secretlen,
(gdb) p qc
$1 = (struct quic_conn *) 0x7ffff00371f0
(gdb) p qc->ael
$2 = (struct quic_enc_level *) 0x0
(gdb) bt
 #0  0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
 #1  0x000000000049bca9 in qc_ssl_provide_quic_data (len=268, data=<optimized out>, ctx=0x7ffff0047f80, level=<optimized out>, ncbuf=<optimized out>) at src/quic_ssl.c:617
 #2  qc_ssl_provide_all_quic_data (qc=qc@entry=0x7ffff00371f0, ctx=0x7ffff0047f80) at src/quic_ssl.c:688
 #3  0x00000000004683a7 in quic_conn_io_cb (t=0x7ffff0047f30, context=0x7ffff00371f0, state=<optimized out>) at src/quic_conn.c:760
 #4  0x000000000063cd9c in run_tasks_from_lists (budgets=budgets@entry=0x7ffff76961f0) at src/task.c:596
 #5  0x000000000063d934 in process_runnable_tasks () at src/task.c:876
 #6  0x0000000000600508 in run_poll_loop () at src/haproxy.c:3073
 #7  0x0000000000600b67 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3287
 #8  0x00007ffff7f6ae45 in start_thread () from /lib64/libpthread.so.0
 #9  0x00007ffff78254af in clone () from /lib64/libc.so.6

When a TLS alert is emitted, haproxy calls quic_set_connection_close() which sets
QUIC_FL_CONN_IMMEDIATE_CLOSE connection flag. This is this flag which is tested
by this patch to make the handshake fail even if SSL_do_handshake() does not
return an error. This test is specific to libressl and never run with
others TLS stack.

Thank you to @lgv5 and @botovq for having reported this issue in GH #2569.

Must be backported as far as 2.6.
@botovq
Copy link

botovq commented May 22, 2024

@wtarreau Thanks!

This problem isn't new. It's been present ever since we added EVP support for chacha20-poly1305 in libressl 3.6.0. I think it was hidden until 97c344d landed almost exactly a year ago. Or did haproxy switch to in-place encryption only recently?

I don't have a better idea than disabling it. I would recommend disabling it for all LibreSSL versions until the next release, though. I'll make sure that the OpenBSD port will keep CHACHA enabled in -current (I'll land the fix soon) and will also disable it in -stable.

That is, I would do this:

-#if !defined(OPENSSL_IS_AWSLC)
+#if !defined(OPENSSL_IS_AWSLC) && (!defined(LIBRESSL_VERSION_NUMBER) || LIBRESSL_VERSION_NUMBER >= 0x4000000fL)

@wtarreau
Copy link
Member

Initially I thought that version 3.6 did work, but I only tested it on x86 and given your comment about aes-ni forcing AES-GCM first it makes sense that it did work by default and has only hidden the problem!

I'm perfectly fine with your proposed adjustment, I trust you that next version will be fixed. I'll reference this issue there so that we can recheck in case of doubt. Many thanks for your help!

@haproxyFred
Copy link
Contributor

haproxyFred commented May 22, 2024

Or did haproxy switch to in-place encryption only recently?

For the packet deciphering, haproxy has always used the in-place method (quic_tls_decrypt()). quic_tls_decrypt2() does the same thing as quic_tls_decrypt() except it does not decipher in place. But only used to check a Retry token.

haproxy-mirror pushed a commit that referenced this issue May 22, 2024
…0_POLY1305

As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's
CHACHA20 in-place implementation that makes haproxy discard incoming QUIC
packets encrypted with it. It's not very easy to observe the issue because:
  - QUIC recommends that CHACHA20 is used in priority
  - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance
    reasons, so the problem is only observed there if a client
    explicitly forces TLS_CHACHA20_POLY1305_SHA256 only.
  - discarded packets cause retransmits showing some apparent activity,
    and the handshake succeeds so it's not easy to analyze from the
    client which thinks that the server is slow to respond.

Thus in practice, on non-x86 machines running LibreSSL, requests made over
QUIC freeze for a long time, unless the client explicitly forces algos
excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by
default on modern OpenBSD systems, and was reported in the issue above
for an arm64 machine running OpenBSD -current, and was also observed on a
mips64 one running OpenBSD 7.5.

There is no simple solution to this problem due to some of the protocol's
constraints without digging too low into the stack (and risking to break
more). Here we're taking a pragmatic approach consisting in making the
connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected,
regardless of the availability of other ciphers. This means that every
time a connection would have hung, instead it will fail fast, allowing
the client to retry over TLS/TCP.

Theo Buehler recommends that we limit this protection to all LibreSSL
versions before 4.0 since it's where the fix will be implemented. Older
stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled,
which should be sufficient to make QUIC work there again as well.

The following config is sufficient to reproduce the issue (on a non-x86
machine, both arm64 & mips64 were confirmed to reproduce it):

    global
        limited-quic

    frontend stats
        mode http
        #bind :8181
        #bind :8443 ssl crt rsa+dh2048.pem
        bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
        timeout client 5s
        stats uri /

And the following commands will trigger the problem on affected LibreSSL
versions:
  curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl -v --http3 -k https://127.0.0.1:8443/

while these ones must work:
  curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/

Normally all of them will work with LibreSSL 4, and only the first one
should fail with stable LibreSSL versions higher than 3.9.2. An haproxy
version without this workaround will show an unresponsive command after
the GET is sent, while a version with the workaround will close the
connection on error. On a version with this workaround, if TCP listeners
are uncommented, curl will automatically fall back to TCP and attempt
the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode
(hence the limited-quic option above) all of them must work.

Many thanks to github user @lgv5 for the detailed report, tests, and
for spotting the issue, and to @botovq (Theo Buehler) for the quick
analysis, patch and help on this workaround.

This needs to be backported to versions 2.6 and above.
@wtarreau
Copy link
Member

Many thanks again for your help, guys, we've pushed the two patches so we're good regarding the forthcoming release. We'll backport them to stable versions as well.

botovq pushed a commit to libressl/openbsd that referenced this issue May 22, 2024
Take the MAC before clobbering the input value on decryption. Fixes hangs
during the QUIC handshake with HAProxy using TLS_CHACHA20_POLY1305_SHA256.

Found, issue pinpointed, and initial fix tested by Lucas Gabriel Vuotto:
Let me take this opportunity to thank the HAProxy team for going out of
their way to keep supporting LibreSSL. It's much appreciated.

See haproxy/haproxy#2569

tweak/ok jsing
@botovq
Copy link

botovq commented May 22, 2024

Thank you all for the help with tracking this down. I committed the fix for chacha to openbsd-current, disabled chacha20-poly1305 in the haproxy port in stable and we will also land a fix for the alert issue hopefully soon.

bob-beck pushed a commit to openbsd/src that referenced this issue May 22, 2024
Take the MAC before clobbering the input value on decryption. Fixes hangs
during the QUIC handshake with HAProxy using TLS_CHACHA20_POLY1305_SHA256.

Found, issue pinpointed, and initial fix tested by Lucas Gabriel Vuotto:
Let me take this opportunity to thank the HAProxy team for going out of
their way to keep supporting LibreSSL. It's much appreciated.

See haproxy/haproxy#2569

tweak/ok jsing
@lgv5
Copy link
Author

lgv5 commented May 22, 2024

From my side, the issue is addressed and resolved. I see the rest of the open issues have tons of tags and different status tracking information, so I feel reluctant to just close it.

Thanks everyone!

@Tristan971
Copy link
Member

Tristan971 commented May 22, 2024

This needs to be backported to versions 2.6 and above.

Means the HAProxy team will keep it open until they backported and released versions where it should go, so your reluctance is correct 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

7 participants