Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadString function is inefficient #289

Open
MS-Renan opened this issue May 11, 2022 · 0 comments
Open

ReadString function is inefficient #289

MS-Renan opened this issue May 11, 2022 · 0 comments

Comments

@MS-Renan
Copy link
Contributor

MS-Renan commented May 11, 2022

Is your feature request related to a problem? Please describe.
When invoking any Helper::DiskIO ReadString function, the function may over estimate the buffer size, by doubling until it fits into the buffer. The function also scans each char for delim and breaks.

Describe the solution you'd like
Instead of having ReadString do resizing and delim parsing. The ReadString should only be responsible for reading into the buffer with the expected size. The expected size should come from the file size, as this is the exact size that must be read. The parsing shouldnt be done at all, instead, in WriteString always write terminating point (the delim always equals \n and then replaced by \0, so why not just let WriteString set the end point?).
Both ReadString / WriteString will basically be boiled down to ReadBinary / WriteBinary.

Additional context
This ReadString is used in reading config (ex. ini) and metadata files (ex. tsv).

By default the read buffer size always starts at 2^16 = 65,536 bytes.

The biggest inefficiency comes from metadata files as they are big. Ex. 100GB file, we know the read is divided up by threads (32), so each thread will eventually resize the buffer to ~4GB (exact size would be ~3GB, so one GB over), which then means we have over estimated the buffer size ~32GB (total over est).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant