Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large database truncate problem #37

Open
Kleissner opened this issue Jan 16, 2021 · 3 comments
Open

Large database truncate problem #37

Kleissner opened this issue Jan 16, 2021 · 3 comments
Labels

Comments

@Kleissner
Copy link

Kleissner commented Jan 16, 2021

We are recently running into this problem which prevents the database from growing. Every time we call db.Put we get this error message:

truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation

The whole database folder is 255 GB. The file main.pix is 37.3 GB of size. Running on Windows Server 2019 as admin and the disk has plenty of storage (4 TB total).

Any idea of the root cause and how to fix it?

I suppose the error message origins from here?

pogreb/file.go

Lines 79 to 86 in e182fb0

func (f *file) extend(size uint32) (int64, error) {
off := f.size
if err := f.Truncate(off + int64(size)); err != nil {
return 0, err
}
f.size += int64(size)
return off, f.Mmap(f.size)
}

Edit: Unrelated to this problem, but in truncate used by recoveryIterator.next it uses uint32. That could lead to problems down the road for large segment files?

pogreb/file.go

Lines 97 to 107 in e182fb0

func (f *file) truncate(size uint32) error {
// Truncating memory-mapped file will fail on Windows. Unmap it first.
if err := f.Mmap(0); err != nil {
return err
}
if err := f.Truncate(int64(size)); err != nil {
return err
}
f.size = int64(size)
return f.Mmap(f.size)
}

@Kleissner
Copy link
Author

Could it be file fragmentation?

Googling this message finds this: https://support.assurestor.com/support/solutions/articles/16000104076-the-requested-operation-could-not-be-completed-due-to-a-file-system-limitation

  1. Compressed files are more likely to reach the limit because of the way the files are stored on disk. Compressed files require more extents to describe their layout. Also, decompressing and compressing a file increases fragmentation significantly.
  2. The limit can be reached when write operations occur to an already compressed chunk location. The limit can also be reached by a sparse file. This size limit is usually between 40 gigabytes (GB) and 90 GB for a very fragmented file.
  3. A heavily fragmented file in an NTFS file system volume may not grow beyond a certain size caused by an implementation limit in structures that are used to describe the allocations.

@akrylysov
Copy link
Owner

Thanks for the bug report.

Unrelated to this problem, but in truncate used by recoveryIterator.next it uses uint32. That could lead to problems down the road for large segment files?

Segment files can't exceed 4GiB https://github.com/akrylysov/pogreb/blob/master/options.go#L40. The max segment size currently is not configurable and is always set to 4GiB.

main.pix is the main index file. Index files use 64-bit offsets:

func bucketOffset(idx uint32) int64 {

Windows support could definitely use more testing. I develop Pogreb on macOS and deploy it to Linux.

I'll try to reproduce the issue. Wondering if it's related to mmap? I'm working on adding an option to disable mmap.

@Kleissner
Copy link
Author

Kleissner commented Jan 17, 2021

I can reproduce the error - anytime db.Put gets called it always fails. I added debugging code and confirm that the referenced extend function fails on this line in file.go:

if err := f.Truncate(off + int64(size)); err != nil {

I've added logging:

		fmt.Printf("Error offset %d size %d from f.Trunacte: %s\n", off, size, err.Error())

And the output is always:

Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation

The offset number is in sync with the file size (37.3 GB). I tried the defragmentation tool of Windows without success (I assume due to SSD it actually didn't defrag).

Then I tried another trick - copying main.pix to a new file, deleting old one, and renaming the new one to original name. It worked! 🎉

So it looks like the underlying error is that when you extend it by 512 times NTFS extends it by millions of chunks (instead of consecutive data) - and at some point it hits an OS internal limit. I will monitor the situation and check if it fails again in 40 GB (which might take weeks).

I guess an ugly fix would be catching that error and then temporarily closing the file, and doing what I did manually - copy, delete old, rename, open.

Microsoft documented the problem here: https://support.microsoft.com/en-in/help/967351/a-heavily-fragmented-file-in-an-ntfs-volume-may-not-grow-beyond-a-cert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants