Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken pipes on large data writes #490

Closed
howardjohn opened this issue Apr 4, 2023 · 9 comments
Closed

Broken pipes on large data writes #490

howardjohn opened this issue Apr 4, 2023 · 9 comments

Comments

@howardjohn
Copy link
Member

fortio load -uniform -c 1 -qps 100 -payload-size=40000 -t 2s -abort-on=-1 http://localhost:808

2023-04-04T15:59:14.949606Z ERROR inbound{id=7e8fd964ca3210a27d985b3cd60cd1ed peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.109007ms
2023-04-04T15:59:14.949666Z  INFO outbound{id=ef1a7bfdfe264051a818926c3088c42d}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.949959Z  INFO outbound{id=7e8fd964ca3210a27d985b3cd60cd1ed}: ztunnel::proxy::outbound: complete dur=13.086084ms
2023-04-04T15:59:14.960976Z  INFO inbound{id=ef1a7bfdfe264051a818926c3088c42d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.964237Z  INFO outbound{id=07367cd252975b4528e07779300e6653}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.964321Z ERROR inbound{id=ef1a7bfdfe264051a818926c3088c42d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=3.082466ms
2023-04-04T15:59:14.964705Z  INFO outbound{id=ef1a7bfdfe264051a818926c3088c42d}: ztunnel::proxy::outbound: complete dur=15.202051ms
2023-04-04T15:59:14.974146Z  INFO inbound{id=07367cd252975b4528e07779300e6653 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.976628Z  INFO outbound{id=2cc21d318db51b851dcee2cab2b9dc57}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.976645Z ERROR inbound{id=07367cd252975b4528e07779300e6653 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.352267ms
2023-04-04T15:59:14.977073Z  INFO outbound{id=07367cd252975b4528e07779300e6653}: ztunnel::proxy::outbound: complete dur=13.092794ms
2023-04-04T15:59:14.988100Z  INFO inbound{id=2cc21d318db51b851dcee2cab2b9dc57 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.990575Z  INFO outbound{id=2cc21d318db51b851dcee2cab2b9dc57}: ztunnel::proxy::outbound: complete dur=14.144643ms
2023-04-04T15:59:14.990614Z ERROR inbound{id=2cc21d318db51b851dcee2cab2b9dc57 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.338017ms
2023-04-04T15:59:14.991361Z  INFO outbound{id=da05eb9e3bc61a97a48020a7ff76a4ac}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.002705Z  INFO inbound{id=da05eb9e3bc61a97a48020a7ff76a4ac peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.005544Z ERROR inbound{id=da05eb9e3bc61a97a48020a7ff76a4ac peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.669356ms
2023-04-04T15:59:15.005681Z  INFO outbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.005919Z  INFO outbound{id=da05eb9e3bc61a97a48020a7ff76a4ac}: ztunnel::proxy::outbound: complete dur=14.936482ms
2023-04-04T15:59:15.016801Z  INFO inbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.018461Z ERROR inbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=1.508339ms
2023-04-04T15:59:15.018995Z  WARN outbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883}: ztunnel::proxy::outbound: failed dur=13.539023ms err=io error: failed to write the buffered data
2023-04-04T15:59:15.019402Z  INFO outbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.027529Z  INFO inbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.029223Z ERROR inbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=1.495578ms
2023-04-04T15:59:15.030291Z  WARN outbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d}: ztunnel::proxy::outbound: failed dur=11.073287ms err=io error: broken pipe

We keep seeing the data closed

@hzxuzhonghu
Copy link
Member

is this because a mismatch between sender and receiver

@ilrudie
Copy link
Contributor

ilrudie commented Apr 10, 2023

@howardjohn, Any chance it's something like hitting a limit on the number of open files or file watches? I think each pipe is going to consume 2 file descriptors (one referring to each end of the pipe).

@howardjohn
Copy link
Member Author

I am seeing this even with a single connection that sends a large amount of data. so it may be some limit but I don't think it would be num files?

@hzxuzhonghu
Copy link
Member

https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer could be related to pipe overflow

@Stevenjin8
Copy link
Contributor

I'm pretty sure this has to do with the TCP connection. I can get a similar message if I run iperf3. When I strace ztunnel and run iperf3 through it, I see this

sendto(25, ...) = -1 EPIPE (Broken Pipe)
...
sendto(22, ...) = -1 EPIPE (Broken Pipe)

From the tcp(7), we have

EPIPE The other end closed the socket unexpectedly or a read is executed on a shut down socket.

There were no close or shutdown calls on socket 25/22, so it seems like the cause is the former. Also, RST packets are generated. Socket 25 is the downstream connection to the iperf3 server and 22 is an upstream connection on the server side.

@Stevenjin8
Copy link
Contributor

However, I can't seem to replicate the issue with fortio (in new or old builds)

@Stevenjin8
Copy link
Contributor

@RobCannon
Copy link

I am seeing the same error. I have installed 1.20 on an EKS cluster. I do not see this on version 1.18.3. I was also having problems with 1.19.x versions.

2023-11-15T15:57:06.771954Z  INFO outbound{id=a686cd938db594dff1aabf0c83400037}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:06.772282Z  WARN outbound{id=9ad7269982104f13b534382feee2cd3a}: ztunnel::proxy::outbound: failed dur=10.000982079s err=io error: deadline has elapsed
2023-11-15T15:57:07.138060Z ERROR inbound{id=cddc5afb1fbde92ddbb639703a53ee14 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.000681626s
2023-11-15T15:57:07.138299Z ERROR inbound{id=b4a606115b4490a89dcfdf95d4adec1c peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.000920014s
2023-11-15T15:57:07.197383Z  INFO outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::outbound: proxying to 10.4.8.36:6379 using node local fast path
2023-11-15T15:57:07.197805Z  INFO outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::outbound: complete dur=537.259µs
2023-11-15T15:57:07.249104Z  WARN outbound{id=c28df9a6431936f63fb439789baaaae6}: ztunnel::proxy::outbound: failed dur=10.001664136s err=io error: deadline has elapsed
2023-11-15T15:57:07.432422Z  INFO outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:07.432779Z  INFO outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::outbound: complete dur=454.854µs
2023-11-15T15:57:07.433600Z ERROR outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.143658ms
2023-11-15T15:57:07.592575Z  INFO outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:07.593005Z  INFO outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::outbound: complete dur=529.237µs
2023-11-15T15:57:07.652315Z  INFO inbound{id=651db3badabba3ee03e21bc2b0bc2197 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:07.656382Z ERROR inbound{id=651db3badabba3ee03e21bc2b0bc2197 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: write zero dur=3.974498ms
2023-11-15T15:57:08.141010Z  INFO inbound{id=548f55ff6ca0fa9c3585ed43ee4f7d80 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:6379
2023-11-15T15:57:08.141548Z  INFO inbound{id=1b6a29a8285a26993d8e3aabe3b63aa7 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:08.210774Z  INFO outbound{id=58c61ef9162fe3f76734700dcc0a2ad6}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:08.229147Z ERROR inbound{id=68ef6c18eca0affd1348985b18b854fb peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.001717433s
2023-11-15T15:57:08.251237Z  INFO outbound{id=c6ac9c9bd515f48f3fd72f84055e15b0}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:08.298242Z  WARN outbound{id=e02b999a32bbc5ab40089df990aa79e3}: ztunnel::proxy::outbound: failed dur=10.00988518s err=io error: deadline has elapsed
2023-11-15T15:57:08.361197Z  INFO outbound{id=01f5e08a7fdf847157aa9c262e69576a}: ztunnel::proxy::outbound: proxy to 10.4.10.124:6379 using HBONE via 10.4.10.124:15008 type Direct
2023-11-15T15:57:08.435542Z  INFO outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:08.435830Z  INFO outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::outbound: complete dur=385.572µs
2023-11-15T15:57:08.436771Z ERROR outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.134786ms
2023-11-15T15:57:08.649526Z ERROR outbound{id=2f571f3ea350050b16672a50965ef6c5}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.001693962s
2023-11-15T15:57:08.659663Z  INFO inbound{id=603e27f6215ee959d33cd2c9128a0587 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:08.663786Z ERROR inbound{id=603e27f6215ee959d33cd2c9128a0587 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=4.029934ms
2023-11-15T15:57:08.946979Z  INFO outbound{id=31b78d7d87b730d10d6b2ffd94081768}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947090Z  INFO outbound{id=67c4990f50077fd541dd23910ecd53b1}: ztunnel::proxy::outbound: proxy to 10.4.143.21:6379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947174Z  INFO outbound{id=fed2a2aefb4c9209532b62af02399255}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947244Z  INFO outbound{id=b9acc3a2915c265139f2b8cef820e30e}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:09.087674Z  INFO outbound{id=c01c8f15bcbf564a09497386d387f726}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.088325Z  INFO outbound{id=e47b709140c52d866fc4c501228f5824}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.089319Z  INFO outbound{id=81c93d5831f85c956dae263df11736da}: ztunnel::proxy::outbound: proxy to 10.4.142.49:6379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.089558Z  INFO outbound{id=c96bfd363105401ec913597a59d9b4f8}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.198323Z ERROR outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.000720122s
2023-11-15T15:57:09.229774Z  INFO inbound{id=c2e823d9a8e2e421c8292fd56eaceda5 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:09.438892Z  INFO outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:09.439217Z  INFO outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::outbound: complete dur=419.656µs
2023-11-15T15:57:09.440046Z ERROR outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.049655ms
2023-11-15T15:57:09.597345Z ERROR outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.004515267s
2023-11-15T15:57:09.615364Z  INFO xds{id=13}: ztunnel::xds::client: received response type_url="type.googleapis.com/istio.workload.Address" size=1

@howardjohn
Copy link
Member Author

fixed by #1067

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants