Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the input block size for bgzip. #1768

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jkbonfield
Copy link
Contributor

Commit e495718 changed bgzip from unix raw POSIX read() calls to hread(). Unfortunately hread gets its buffer size from stat of the input file descriptor, which can be 4kb for a pipe. We're reading 0xff00 bytes, so this ends up being split over two reads mostly, with one or both involving additional memcpys. This makes the buffered I/O worse performing than non-buffered. In the most extreme cases (cat data | bgzip -l0 > /dev/null) this is a two fold slow down.

The easy solution is just to increase the buffer size to something sensible. It's a little messy as we have to use hfile_internal.h to get hfile_set_blksize, but it works. I'm not sure why we didn't elect to make that API more public. Probably simply out of caution.

Fixes #1767

Commit e495718 changed bgzip from unix raw POSIX read() calls to
hread().  Unfortunately hread gets its buffer size from stat of the
input file descriptor, which can be 4kb for a pipe.  We're reading
0xff00 bytes, so this ends up being split over two reads mostly, with
one or both involving additional memcpys.  This makes the buffered I/O
worse performing than non-buffered.  In the most extreme cases (cat
data | bgzip -l0 > /dev/null) this is a two fold slow down.

The easy solution is just to increase the buffer size to something
sensible.  It's a little messy as we have to use hfile_internal.h to
get hfile_set_blksize, but it works.  I'm not sure why we didn't elect
to make that API more public.  Probably simply out of caution.

Fixes samtools#1767
@jkbonfield
Copy link
Contributor Author

Edited the buffer down to 256Kb instead of 1Mb as it seems to be sufficient (tested on tmpfs, fast NFS and lustre).

@jkbonfield
Copy link
Contributor Author

jkbonfield commented Apr 10, 2024

Hmm, it's still variable! The effect of a bigger block size is more memcpy as we have fewer direct reads (as @daviesrob points out a readv can partially solve that as we can read direct to caller buff + remainder to look-ahead cache, but it doesn't fit with the backend semantics), but it also reduces the impact of many small reads due to small pipe size.

Hence some machines it's slower to have a bigger block size, and it can have weird interactions with CPU load too which I cannot explain. Eg it's a win at -l0, but a penalty at -l1. It needs more head scratching probably.

@daviesrob daviesrob self-assigned this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bgzip performance drop from v1.16
2 participants