Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Segfault on Windows when out of disk space #706

Open
lukechampine opened this issue Jul 18, 2017 · 1 comment
Open

Segfault on Windows when out of disk space #706

lukechampine opened this issue Jul 18, 2017 · 1 comment

Comments

@lukechampine
Copy link
Contributor

Filling the disk with a bolt database causes a segfault on Windows. This script reproduces the bug (tested on Windows 10).

stack trace:

unexpected fault address 0x7fff1040
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0x7fff1040 pc=0x45b9e8]
 
goroutine 1 [running]:
runtime.throw(0x4ecf03, 0x5)
        C:/Go/src/runtime/panic.go:566 +0x9c fp=0xc0420d5c38 sp=0xc0420d5c18
runtime.sigpanic()
        C:/Go/src/runtime/signal_windows.go:164 +0x10b fp=0xc0420d5c68 sp=0xc0420d5c38
github.com/boltdb/bolt.(*DB).meta(0xc04207e000, 0x1ec)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/db.go:811 +0x38 fp=0xc0420d5cc0 sp=0xc0420d5c68
github.com/boltdb/bolt.(*Tx).rollback(0xc0420841c0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/tx.go:255 +0x79 fp=0xc0420d5ce8 sp=0xc0420d5cc0
github.com/boltdb/bolt.(*Tx).Commit(0xc0420841c0, 0x0, 0x0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/tx.go:164 +0x8b2 fp=0xc0420d5e38 sp=0xc0420d5ce8
github.com/boltdb/bolt.(*DB).Update(0xc04207e000, 0xc0420d5ec0, 0x0, 0x0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/db.go:605 +0x114 fp=0xc0420d5e88 sp=0xc0420d5e38

Inside tx.Commit, tx.root.spill() is returning an error, causing tx.rollback() to be called. tx.rollback() calls db.meta() which deferences the DB's meta pages, which causes the segfault. Investigating further, the error returned by tx.root.spill() can be traced back to a call to mmap. The specific error is: "truncate: truncate test.db: There is not enough space on the disk".

My guess is that the failed mmap invalidates db.meta0 and/or db.meta1. The existing data is unmapped before mmap is called, so this seems likely. However, I don't have a good explanation for why this only causes a segfault on Windows. Linux correctly returns an error from the db.Update call, and I believe OS X does as well.

I'm not sure what the best way to handle this is. One idea would be to set a special flag in db if the mmap fails. This is a critical failure state, so there's some justification for handling it specially. The flag would cause tx.rollback() to skip some of its cleanup steps, in particular this call:

tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))

In addition, it would be helpful for client programs if the db.Update call returned a special error value (or type) so that they could detect it and decide whether to panic. Even if bolt itself panicked, that would be miles better than a segfault, since a normal panic can at least be caught and converted to a more user-friendly error message.

Let me know if I can test any other potential fixes. In the meantime I'll probably go ahead and implement the fix described above.

@lukechampine
Copy link
Contributor Author

Update: I've confirmed that this patch prevents the segfault:

diff --git a/tx.go b/tx.go
index 6700308..a1527dc 100644
--- a/tx.go
+++ b/tx.go
@@ -161,7 +161,7 @@ func (tx *Tx) Commit() error {
        // spill data onto dirty pages.
        startTime = time.Now()
        if err := tx.root.spill(); err != nil {
-               tx.rollback()
+               tx.close()
                return err
        }
        tx.stats.SpillTime += time.Since(startTime)

However, I'm not sure if this behavior is safe. Skipping tx.rollback means we don't call freelist.rollback or freelist.reload. Could this cause trouble if the caller repeatedly tries to call Update?

A bigger issue is that, even though a nice error message is returned now, the db file is still unmapped. I haven't tested it, but I assume that in this state it would segfault even if you called db.View. As such, we should probably proceed with the original plan to add a flag to the db type. That way, we could return an error immediately from calls that would otherwise segfault.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant