-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading from /dev/stdin
is very slow
#685
Comments
Cool bug, thanks for the report! If someone picks this up before me, I'd try to attach a profiler to see what's taking so long. |
we've detected the same issue with mtail 3.0.0-rc50 (shipped with Debian bookworm) and latest version available (3.0.0-rc55) |
perf top report looks like this:
in our setup, mtail 3.0.0~rc43 (shipped with Debian bullseye) was working as expected |
latest
Wild guess that the read buffer is too small on pipes, so we're doing a lot of syscalls. |
Increasing the read buffer from 4096 to 65536 had this effect:
total time drops, CPU increased, so less time waiting on syscalls (again guessing, but seems obvious -- i could hook up a trace and get a profile for time spent blocked on syscalls to confirm) |
A 128K buffer seems pretty good. I can imagine an improvement where |
Fewer syscalls reading from large stdins means we spend more time processing and less time waiting for the kernel. Issue: #685
Pipes use Sockets use Datagrams use whatever you set Regular files don't seem to have a read buffer (because the entire file is available to the VFS I presume) and anecdotal data on the net says maybe 4GiB. Dealing with buffer sizes on Linux seems tractable but tedious; I looked into Windows support and couldn't get a satifsfying answer. Increasing buffers per stream is also going to increase RAM usage so I don't want to go wild here without better reasoning. Finally, the stdin read rate is still 10x slower than direct file reads. I presume this is because the pipe buffer requires copying into kernel memory and back out again, while direct file i/o has fewer (maybe zero) copies. |
Processing this 7.5M line file normally takes ~2-3 minutes on my laptop:
But when it is piped on standard input like this:
it takes forever, reading ~2.5k lines/minute:
Tested on the latest relase downloaded from github and the simplest line counter program:
The text was updated successfully, but these errors were encountered: