-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary Reproduceability #3484
Comments
If the database is created under different environment, is the binary directly comparable? Could it depend on parallelism, etc? |
It can depend on parallelism when copying, but if we always create databases in single-threaded mode when doing such comparisons it should be able to be directly comparable. |
As noted in #3501, the catalog stores and serializes catalog entries using the order produced by an |
Serialize catalog entries in a certain order makes more sense to me. |
I've found that one way of ensuring that directly writing a struct to disk works consistently is to assert that it has a unique object representation via std::has_unique_object_representations_v. Among other things it ensures there is no padding (which isn't guaranteed to be zeroed and particularly in release mode will usually be uninitialized). |
In addition to testing databases created on one platform on different platforms, it would be useful to do direct binary comparisons between the databases (when produced in single threaded mode). That will help cover gaps in the coverage of the binary database tests, and can also enforce that kuzu's output is deterministic and the data is being properly initialized.
It could be implemented by creating a new database alongside the binary database tests, and comparing them with something like
diff -r
from GNU diffutils. An older version of diffutils is available for windows here (or via chocolatey).The text was updated successfully, but these errors were encountered: