BoltDB.Open crash when opening a partial file, require integrity check function #698

xxr3376 · 2017-06-16T03:14:13Z

I'am writing a storage agent based on boltDB, the agent will never restart if it's killed during network recovering. This is because it will try to open a partial file and checking its version at very beginning.

I found that if I migrate boltDB between two hosts by following code, if migration failed during the procedure, it may leave a partial db file on disk, and this file will make boltDB crash when trying to opening that file.

You can reproduce this error by truncate a boltDB file in the middle, then open it. This file will contain a proper magic header but wrong content.

Various of errors may happened depends on the location of truncate point, it's useless to paste any error here. Is there any method to check integrity of a file?
Sender:

func (b *boltDB) ExportTo(acceptType []string, meta MetaWriter, writer io.Writer) error {
    export := false
    for _, t := range acceptType {
        if t == boltType {
            export = true
        }
    }
    if !export {
        return errUnknownFormat
    }
    return b.db.View(func(tx *bolt.Tx) error {
        if meta != nil {
            if err := meta(boltType, tx.Size()); err != nil {
                return err
            }
        }
        _, err := tx.WriteTo(writer)
        return err
    })
}

Receiver:

func ImportBoltDB(filename string, contentType string, reader io.Reader) (KV, error) {
    if contentType != boltType {
        return nil, errUnknownFormat
    }
    file, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0600)
    if err != nil {
        return nil, err
    }
    _, err = io.Copy(file, reader)
    file.Close()
    if err != nil {
        // XXX Incomplete file should be deleted to prevent boltDB crash
        // This method try it's best to remove partial file, but it can't do anything when receiving SIGKILL.
        os.Remove(filename)
        return nil, err
    }
    return NewBoltDB(filename)
}

glycerine · 2017-07-15T17:19:02Z

blake2b features tree-based (updatable/incremental cryptographic) hashes that were designed for checksumming entire filesystems, so you could use it here to develop a solution. See

https://blake2.net/

and Go libs are available:

https://github.com/glycerine/blake2b-simd

https://github.com/dchest/blake2b

(update: specifically, see section 2.10 of https://blake2.net/blake2_20130129.pdf)

xxr3376 · 2017-07-17T02:12:59Z

I will compare checksum for integrity during transmission.

Still hope to know, it's there any possible to avoid SEGFAULT when opening an partial file?

glycerine · 2017-07-17T03:31:06Z

Use defer and recover.

xxr3376 · 2017-07-17T03:49:17Z

No, you can't recover from an SEGFAULT error, no matter in which language.

glycerine · 2017-07-17T04:23:06Z

don't be ridiculous. Only SIGKILL and SIGSTOP cannot be caught. recover works fine for segfaults:

package main                                                                                                   
                                                                                                               
import "fmt"                                                                                                   
                                                                                                               
type s struct {                                                                                                
    a int                                                                                                      
}                                                                                                              
                                                                                                               
func main() {                                                                                                  
                                                                                                               
    var p *s                                                                                                   
                                                                                                               
    defer func() {                                                                                             
        if caught := recover(); caught != nil {                                                                
            fmt.Printf("recovered from segfault")                                                              
        }                                                                                                      
    }()                                                                                                        
                                                                                                               
    p.a = 10                                                                                                   
}

xxr3376 · 2017-07-17T05:57:16Z

Sorry for saying Can't handle SEGFAULT in any language, we can definitely recover by handling singal.

It's hard to recover from SEGFAULT in following code, you can have a try.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"math/rand"
	"os"

	"github.com/boltdb/bolt"
)

func main() {
	// Remove previous data
	os.Remove("/tmp/test1.db")
	os.Remove("/tmp/test2.db")

	b, err := bolt.Open("/tmp/test1.db", 0600, nil)
	log.Println("Writing data.")
	err = b.Update(func(tx *bolt.Tx) error {
		b, err := tx.CreateBucketIfNotExists([]byte("haha"))
		if err != nil {
			return err
		}
		d := make([]byte, 128)

		for i := 0; i < 10000; i += 1 {
			n, err := rand.Read(d)
			if n != 128 {
				panic("bad len")
			}
			if err != nil {
				return err
			}
			err = b.Put(d, d)
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic("Inserting.")
	}
	err = b.Close()
	if err != nil {
		panic("can't close file")
	}

	log.Println("Testing")
	data, err := ioutil.ReadFile("/tmp/test1.db")
	if err != nil {
		log.Println(err)
		panic("can't read source db")
	}
	err = ioutil.WriteFile("/tmp/test2.db", data[:len(data)/2], 0600)
	if err != nil {
		panic("can't write source db")
	}
	testDB("/tmp/test2.db")
}

func testDB(fn string) {
	defer func() {
		if r := recover(); r != nil {
			log.Println("Recovered in testDB", r)
		}
	}()
	b, err := bolt.Open(fn, 0600, nil)
	if err != nil {
		return
	}
	_ = b
	return
}

(update):
I don't want to handle low-level signal in my main function, it's really hard to do in-place recover.
Your code works fine, because go runtime can identify that nil pointer for you. If errors come from linux kernel (e.g. mmap memory), I believe we can't just simplely recover by calling recover function.

tmm1 mentioned this issue Jun 27, 2018

Crash when trying to open corrupted database etcd-io/bbolt#105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BoltDB.Open crash when opening a partial file, require integrity check function #698

BoltDB.Open crash when opening a partial file, require integrity check function #698

xxr3376 commented Jun 16, 2017

glycerine commented Jul 15, 2017 •

edited

xxr3376 commented Jul 17, 2017

glycerine commented Jul 17, 2017

xxr3376 commented Jul 17, 2017

glycerine commented Jul 17, 2017 •

edited

xxr3376 commented Jul 17, 2017 •

edited

BoltDB.Open crash when opening a partial file, require integrity check function #698

BoltDB.Open crash when opening a partial file, require integrity check function #698

Comments

xxr3376 commented Jun 16, 2017

glycerine commented Jul 15, 2017 • edited

xxr3376 commented Jul 17, 2017

glycerine commented Jul 17, 2017

xxr3376 commented Jul 17, 2017

glycerine commented Jul 17, 2017 • edited

xxr3376 commented Jul 17, 2017 • edited

glycerine commented Jul 15, 2017 •

edited

glycerine commented Jul 17, 2017 •

edited

xxr3376 commented Jul 17, 2017 •

edited