Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

BoltDB.Open crash when opening a partial file, require integrity check function #698

Open
xxr3376 opened this issue Jun 16, 2017 · 6 comments

Comments

@xxr3376
Copy link

xxr3376 commented Jun 16, 2017

I'am writing a storage agent based on boltDB, the agent will never restart if it's killed during network recovering. This is because it will try to open a partial file and checking its version at very beginning.

I found that if I migrate boltDB between two hosts by following code, if migration failed during the procedure, it may leave a partial db file on disk, and this file will make boltDB crash when trying to opening that file.

You can reproduce this error by truncate a boltDB file in the middle, then open it. This file will contain a proper magic header but wrong content.

Various of errors may happened depends on the location of truncate point, it's useless to paste any error here. Is there any method to check integrity of a file?
Sender:

func (b *boltDB) ExportTo(acceptType []string, meta MetaWriter, writer io.Writer) error {
    export := false
    for _, t := range acceptType {
        if t == boltType {
            export = true
        }
    }
    if !export {
        return errUnknownFormat
    }
    return b.db.View(func(tx *bolt.Tx) error {
        if meta != nil {
            if err := meta(boltType, tx.Size()); err != nil {
                return err
            }
        }
        _, err := tx.WriteTo(writer)
        return err
    })
}

Receiver:

func ImportBoltDB(filename string, contentType string, reader io.Reader) (KV, error) {
    if contentType != boltType {
        return nil, errUnknownFormat
    }
    file, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0600)
    if err != nil {
        return nil, err
    }
    _, err = io.Copy(file, reader)
    file.Close()
    if err != nil {
        // XXX Incomplete file should be deleted to prevent boltDB crash
        // This method try it's best to remove partial file, but it can't do anything when receiving SIGKILL.
        os.Remove(filename)
        return nil, err
    }
    return NewBoltDB(filename)
}
@glycerine
Copy link

glycerine commented Jul 15, 2017

blake2b features tree-based (updatable/incremental cryptographic) hashes that were designed for checksumming entire filesystems, so you could use it here to develop a solution. See

https://blake2.net/

and Go libs are available:

https://github.com/glycerine/blake2b-simd

https://github.com/dchest/blake2b

(update: specifically, see section 2.10 of https://blake2.net/blake2_20130129.pdf)

@xxr3376
Copy link
Author

xxr3376 commented Jul 17, 2017

I will compare checksum for integrity during transmission.

Still hope to know, it's there any possible to avoid SEGFAULT when opening an partial file?

@glycerine
Copy link

Use defer and recover.

@xxr3376
Copy link
Author

xxr3376 commented Jul 17, 2017

No, you can't recover from an SEGFAULT error, no matter in which language.

@glycerine
Copy link

glycerine commented Jul 17, 2017

don't be ridiculous. Only SIGKILL and SIGSTOP cannot be caught. recover works fine for segfaults:

package main                                                                                                   
                                                                                                               
import "fmt"                                                                                                   
                                                                                                               
type s struct {                                                                                                
    a int                                                                                                      
}                                                                                                              
                                                                                                               
func main() {                                                                                                  
                                                                                                               
    var p *s                                                                                                   
                                                                                                               
    defer func() {                                                                                             
        if caught := recover(); caught != nil {                                                                
            fmt.Printf("recovered from segfault")                                                              
        }                                                                                                      
    }()                                                                                                        
                                                                                                               
    p.a = 10                                                                                                   
}

@xxr3376
Copy link
Author

xxr3376 commented Jul 17, 2017

Sorry for saying Can't handle SEGFAULT in any language, we can definitely recover by handling singal.

It's hard to recover from SEGFAULT in following code, you can have a try.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"math/rand"
	"os"

	"github.com/boltdb/bolt"
)

func main() {
	// Remove previous data
	os.Remove("/tmp/test1.db")
	os.Remove("/tmp/test2.db")

	b, err := bolt.Open("/tmp/test1.db", 0600, nil)
	log.Println("Writing data.")
	err = b.Update(func(tx *bolt.Tx) error {
		b, err := tx.CreateBucketIfNotExists([]byte("haha"))
		if err != nil {
			return err
		}
		d := make([]byte, 128)

		for i := 0; i < 10000; i += 1 {
			n, err := rand.Read(d)
			if n != 128 {
				panic("bad len")
			}
			if err != nil {
				return err
			}
			err = b.Put(d, d)
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic("Inserting.")
	}
	err = b.Close()
	if err != nil {
		panic("can't close file")
	}

	log.Println("Testing")
	data, err := ioutil.ReadFile("/tmp/test1.db")
	if err != nil {
		log.Println(err)
		panic("can't read source db")
	}
	err = ioutil.WriteFile("/tmp/test2.db", data[:len(data)/2], 0600)
	if err != nil {
		panic("can't write source db")
	}
	testDB("/tmp/test2.db")
}

func testDB(fn string) {
	defer func() {
		if r := recover(); r != nil {
			log.Println("Recovered in testDB", r)
		}
	}()
	b, err := bolt.Open(fn, 0600, nil)
	if err != nil {
		return
	}
	_ = b
	return
}

(update):
I don't want to handle low-level signal in my main function, it's really hard to do in-place recover.
Your code works fine, because go runtime can identify that nil pointer for you. If errors come from linux kernel (e.g. mmap memory), I believe we can't just simplely recover by calling recover function.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants