Segfault on Windows when out of disk space #706

lukechampine · 2017-07-18T16:45:23Z

Filling the disk with a bolt database causes a segfault on Windows. This script reproduces the bug (tested on Windows 10).

stack trace:

unexpected fault address 0x7fff1040
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0x7fff1040 pc=0x45b9e8]
 
goroutine 1 [running]:
runtime.throw(0x4ecf03, 0x5)
        C:/Go/src/runtime/panic.go:566 +0x9c fp=0xc0420d5c38 sp=0xc0420d5c18
runtime.sigpanic()
        C:/Go/src/runtime/signal_windows.go:164 +0x10b fp=0xc0420d5c68 sp=0xc0420d5c38
github.com/boltdb/bolt.(*DB).meta(0xc04207e000, 0x1ec)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/db.go:811 +0x38 fp=0xc0420d5cc0 sp=0xc0420d5c68
github.com/boltdb/bolt.(*Tx).rollback(0xc0420841c0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/tx.go:255 +0x79 fp=0xc0420d5ce8 sp=0xc0420d5cc0
github.com/boltdb/bolt.(*Tx).Commit(0xc0420841c0, 0x0, 0x0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/tx.go:164 +0x8b2 fp=0xc0420d5e38 sp=0xc0420d5ce8
github.com/boltdb/bolt.(*DB).Update(0xc04207e000, 0xc0420d5ec0, 0x0, 0x0)
        C:/Users/nebul/go/src/github.com/boltdb/bolt/db.go:605 +0x114 fp=0xc0420d5e88 sp=0xc0420d5e38

Inside tx.Commit, tx.root.spill() is returning an error, causing tx.rollback() to be called. tx.rollback() calls db.meta() which deferences the DB's meta pages, which causes the segfault. Investigating further, the error returned by tx.root.spill() can be traced back to a call to mmap. The specific error is: "truncate: truncate test.db: There is not enough space on the disk".

My guess is that the failed mmap invalidates db.meta0 and/or db.meta1. The existing data is unmapped before mmap is called, so this seems likely. However, I don't have a good explanation for why this only causes a segfault on Windows. Linux correctly returns an error from the db.Update call, and I believe OS X does as well.

I'm not sure what the best way to handle this is. One idea would be to set a special flag in db if the mmap fails. This is a critical failure state, so there's some justification for handling it specially. The flag would cause tx.rollback() to skip some of its cleanup steps, in particular this call:

tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))

In addition, it would be helpful for client programs if the db.Update call returned a special error value (or type) so that they could detect it and decide whether to panic. Even if bolt itself panicked, that would be miles better than a segfault, since a normal panic can at least be caught and converted to a more user-friendly error message.

Let me know if I can test any other potential fixes. In the meantime I'll probably go ahead and implement the fix described above.

The text was updated successfully, but these errors were encountered:

lukechampine · 2017-07-18T17:58:47Z

Update: I've confirmed that this patch prevents the segfault:

diff --git a/tx.go b/tx.go
index 6700308..a1527dc 100644
--- a/tx.go
+++ b/tx.go
@@ -161,7 +161,7 @@ func (tx *Tx) Commit() error {
        // spill data onto dirty pages.
        startTime = time.Now()
        if err := tx.root.spill(); err != nil {
-               tx.rollback()
+               tx.close()
                return err
        }
        tx.stats.SpillTime += time.Since(startTime)

However, I'm not sure if this behavior is safe. Skipping tx.rollback means we don't call freelist.rollback or freelist.reload. Could this cause trouble if the caller repeatedly tries to call Update?

A bigger issue is that, even though a nice error message is returned now, the db file is still unmapped. I haven't tested it, but I assume that in this state it would segfault even if you called db.View. As such, we should probably proceed with the original plan to add a flag to the db type. That way, we could return an error immediately from calls that would otherwise segfault.

lukechampine mentioned this issue Jul 18, 2017

prevent segfault after mmap failure #707

Closed

lukechampine mentioned this issue Sep 29, 2017

Prevent segfault after mmap failure etcd-io/bbolt#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault on Windows when out of disk space #706

Segfault on Windows when out of disk space #706

lukechampine commented Jul 18, 2017

lukechampine commented Jul 18, 2017

Segfault on Windows when out of disk space #706

Segfault on Windows when out of disk space #706

Comments

lukechampine commented Jul 18, 2017

lukechampine commented Jul 18, 2017