Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PebblesDB does not discard partially-flushed values #28

Open
mj-ramos opened this issue Sep 21, 2023 · 0 comments
Open

PebblesDB does not discard partially-flushed values #28

mj-ramos opened this issue Sep 21, 2023 · 0 comments

Comments

@mj-ramos
Copy link

Verified in:

What happened:
After experiencing a power failure while adding values to PebblesDB with the verify_checksums and paranoid_checks parameters set to true, database gets corrupted. After applying the recovery method suggested in https://github.com/google/leveldb/blob/main/doc/index.md (using RepairDB), a value that was partially persisted is present.

The root cause of the problem is that some writes to the log file exceed the common size of a page at the page cache. This can result in a "torn write" scenario where only part of the write's payload is persisted while the rest is not, since the pages of the page cache can be flushed out of order. There are several references about this problem:

This problem was already reported in leveldb google/leveldb#251 and does not exist in the latest release (1.23).

How to reproduce
This issue can be replicated using LazyFS, a file system capable of simulating power failures and the behavior of the OS mentioned above, i.e., simulating file system pages persisted out of order at the disk.
The main problem is a write to the file 000003.log which is 12288 bytes long. LazyFS will persist portions (in sizes of 4096 bytes) of this write out of order and will crash, simulating a power failure.
To reproduce this problem, one can follow these steps (the mentioned files write_test.cpp, etc., are in this zip pebblesdb_test.zip):

  1. Mount LazyFS on a directory where PebblesDB data will be saved, with a specified root directory. Assuming the data path for PebblesDB is /home/pebblesdb/data and the root directory is /home/pebblesdb/data-r, add the following lines to the default configuration file (located in the config/default.toml directory):
[[injection]]
type="split_write"
file="/home/pebblesdb/data-r/000003.log"
persist=[1,3]
parts=3
occurrence=4

These lines define a fault to be injected. A power failure will be simulated after writing to the /home/pebblesdb/data-r/000003.log file. Since this write is large (12288 bytes), it is split into 3 parts (each with 4096 bytes), and only the first and the third parts will be persisted. Specify that it's the fourth write issued to this file (with the parameter occurrence).

  1. Start LazyFS with the following command:
    ./scripts/mount-lazyfs.sh -c config/default.toml -m /home/pebblesdb/data -r /home/pebblesdb/data-r -f

  2. Compile and execute the write_test.cpp file, that adds 4 pairs of key-values to PebblesDB, where the third pair is the only one that exceeds the size of a page at the page cache .

Immediately after this step, PebblesDB will shut down because LazyFS was unmounted, simulating the power failure. At this point, you can analyze the logs produced by LazyFS to see the system calls issued until the moment of the fault. Here is a simplified version of the log:

{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '262144', 'off': '0'}
{'syscall': 'read', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '131072', 'off': '0'}
{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '4096', 'off': '0'}
{'syscall': 'fsync', 'path': '/home/pebblesdb/data-r/000003.log'}
{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '4096', 'off': '0'}
{'syscall': 'fsync', 'path': '/home/pebblesdb/data-r/000003.log'}
{'syscall': 'write', 'path': /home/pebblesdb/data-r/000003.log', 'size': '12288', 'off': '0'}
  1. Remove the fault from the configuration file, unmount the filesystem with fusermount -uz /home/pebblesdb/data
  2. Mount LazyFS again with the previously provided command.
  3. Attemp to start PebblesDB (it fails).
  4. Compile and execute the repair.cpp file that recovers the database.
  5. Compile and execute the read_test.cpp file that reads and checks the values previously inserted. The value for the key k3 is only part of the initial value.

Note that when paranoid_checks and verify_checksums are set to false, PebblesDB does not fail on restart and discards the partial value of the key k3 (says that this key does not exist).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant