Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(all): add versioning of serializable types on tfhe-rs 0.6 #1151

Draft
wants to merge 5 commits into
base: integration/versioning
Choose a base branch
from

Conversation

nsarlin-zama
Copy link
Contributor

@nsarlin-zama nsarlin-zama commented May 15, 2024

closes: zama-ai/tfhe-rs-internal#538

PR content/description

This PR adds data versioning to serialized types for backward compatibility between tfhe-rs versions. This is done using a new crate, tfhe-versionable, that adds a set of derive macros. These macro derive a pair of traits (Versionize/Unversionize) that add conversion functions between a type and its "versioned" representation. The versioned representation of a type is an enum where each variant is a version of the type.

Before serialization, the type is wrapped into the latest variant of the enum in the versionize method of the Versionize trait. To be able to use it after deserialization, the enum is converted into the target type with the unversionize method of the Unversionize trait. To make this work, we have to define for each older version of a type an upgrade method that is able to transform version Vn into Vn+1. The generated unversionize method will chain calls of upgrade enough times to get to the latest version.

For a given type that has to be versioned, there are 3 macro that should be used:

  • Versionize: used on the main type, that is used elsewhere in the code. Will derive the Versionize/Unversionize traits
  • Version: used on a previous version of the type. Versionize also automatically derive Version for the latest version.
  • VersionsDispatch: used on the enum with all the versions. Each variant should derive Version, except the last one that derives Versionize

a fourth proc macro NotVersioned can be used on a type that should not be versioned. The Versionize/Unversionize traits will be implemented using Self as versioned representation of the type. This is used for built-in types.

Here is an example of the workflow:

use tfhe_versionable::{Unversionize, Upgrade, Version, Versionize, VersionsDispatch};

// The structure that should be versioned, as defined in tfhe-rs
#[derive(Versionize)]
#[versionize(MyStructVersions)] // Link to the enum type that will holds all the versions of this type
struct MyStruct<T: Default> {
    attr: T,
    builtin: u32,
}

// To avoid polluting the main code code, the old versions are defined in another module/file, along with the dispatch enum
#[derive(Version)] // Used to mark an old version of the type
struct MyStructV0 {
    builtin: u32,
}

// The Upgrade trait tells how to go from the first version to the last. During unversioning, the
// upgrade method will be called on the deserialized value enough times to go to the last variant.
impl<T: Default> Upgrade<MyStruct<T>> for MyStructV0 {
    fn upgrade(self) -> MyStruct<T> {
        MyStruct {
            attr: T::default(),
            builtin: self.builtin,
        }
    }
}

// This is the dispatch enum, that holds one variant for each version of your type.
#[derive(VersionsDispatch)]
// This enum is not directly used but serves as a template to generate new enums that will be
// serialized. This allows recursive versioning.
#[allow(unused)]
enum MyStructVersions<T: Default> {
    V0(MyStructV0),
    V1(MyStruct<T>),
}

fn main() {
    let ms = MyStruct {
        attr: 37u64,
        builtin: 1234,
    };

    let serialized = bincode::serialize(&ms.versionize()).unwrap();

    // This can be called in future versions of tfhe-rs, when more variants have been added
    let _unserialized = MyStruct::<u64>::unversionize(bincode::deserialize(&serialized).unwrap());
}

The proc macro are used to handle the versioning recursivity. If we see a type definition as a tree where each type is a node and its children are the types of its attributes, the version of a given type is made to be independent of the version of its children. That way, if we update a type we don't have to manually update the version of all the type that recursively use it.

The macros handle:

  • Struct/enum/union
  • generics
  • conversion with a call to into/from/try_from before and after the versioning/unversioning (similarly to serde)

Internals

Internally, the Version proc macro will generate for each version of the type a pair of associated types. Each associated types will have the same shape as the type that the macro is derived on except that their fields will be replaced by their versioned representation. The difference between the two types is that one is defined using references and the other using owned data. This allows to try to avoid copies as much as possible.

For example for this type:

struct MyStruct {
  inner: MyStructInner
}

the macro will generate these types:

#[derive(Serialize)]
struct MyStructVersion<'vers> {
  inner: MyStructInner::Versioned<'vers>
}

#[derive(Serialize, Deserialize)]
struct MyStructVersionOwned {
  inner: MyStructInner::VersionedOwned
}

MyStructVersion will be used for versioning if possible, and MyStructVersionOwned for unversioning and for versioning if it is not possible to use a reference. The macro also generates conversion methods between a type and its Version associated types. It also implements a Version trait that allows easier access to these generated types in other macro.

Similarly, the VersionsDispatch macro will generate for the dispatch enum two associated enums, one with references and one with owned data. These enums will be used as the versioned representation for the type. They are the result and parameters of the versionize and unversionize methods and can be serialized/deserialized:

enum MyStructVersions {
  V0(MyStructV0),
  V1(MyStruct)
}

// this is generated by `VersionsDispatch`
#[derive(Serialize)]
enum MyStructVersionsDispatch<'vers> {
  V0(MyStructV0Version<'vers>),
  V1(MyStructVersion<'vers>)
}

#[derive(Serialize, Deserialize)]
enum MyStructVersionsDispatchOwned {
  V0(MyStructV0VersionOwned),
  V1(MyStructVersionOwned)
}

Finally, the Versionize macro will use the generated enums. versionize is just a conversion between MyStruct and the latest variant of MyStructVersionsDispatch and unversionize is a conversion between MyStructVersionDispatchOwned and MyStruct (slightly more complicated because of the chained calls to upgrade)

TODO

Versionize is currently implemented for the shortint ciphertext and all its subtypes.

  • Implement the proc-macro
  • Versionize all the things !
  • Handle errors during unversioning (ex: failed upgrades or conversion)
  • Generate test data

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

@cla-bot cla-bot bot added the cla-signed label May 15, 2024
@nsarlin-zama nsarlin-zama changed the base branch from main to release/0.6.x May 15, 2024 14:06
@nsarlin-zama
Copy link
Contributor Author

I don't think this is a breaking change for serialization/deserialization, since this PR only adds an optional set of methods on every types. i.e. messages serialized directly will still be deserializable after this PR. To use the versioning you need to use the versionize/unversionize methods.

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 3 times, most recently from baba1c2 to a31dd7b Compare May 15, 2024 15:06
@nsarlin-zama nsarlin-zama changed the base branch from release/0.6.x to integration/versioning May 16, 2024 14:27
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments that have nothing to do with the code to start

@nsarlin-zama nsarlin-zama force-pushed the ns/0.6_with_versionize branch 2 times, most recently from 51694f7 to be733d5 Compare May 17, 2024 11:58
Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional non code comments, then we'll get to the meat of the proc macro review (though I'm far from a pro on that)

I like how little code it ends up requiring in TFHE-rs (though I know there is no upgrade implementations for now but it is looking very promising)

#[derive(serde::Serialize, serde::Deserialize, Versionize)]
#[versionize(SerializableLweCiphertextModulusVersions)]
/// Actual serialized modulus to be able to carry the UnsignedInteger bitwidth information
pub struct SerializableLweCiphertextModulus {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be rename to just "SerializaleCiphertextModulus", was likely a leftover from something older

Comment on lines +1 to +4
[package]
name = "tfhe-versionable"
version = "0.1.0"
edition = "2021"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure there are enough keys here to be able to publish the crate, I have had the issue with the ZK crate, so the set of metadata in the zk crate should be a good indication of minimal metadata

Comment on lines 1 to 4
[package]
name = "tfhe-versionable-derive"
version = "0.1.0"
edition = "2021"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing for publication

@tmontaigu
Copy link
Contributor

tmontaigu commented May 21, 2024

Still in the process of reviewing,

I have a doubt on the fact that the derive crate is in the tfhe-versionable crate dir, I wonder if when doing cargo package/publish the code from the derive crate is going to be included in the tar ball that cargo uploads (and is not going to use it to build tfhe-versionable as it goind to download the proper the-versionable-derive crate)

@nsarlin-zama
Copy link
Contributor Author

I have a doubt on the fact that the derive crate is in the tfhe-versionable crate dir, I wonder if when doing cargo package/publish the code from the derive crate is going to be included in the tar ball that cargo uploads (and is not going to use it to build tfhe-versionable as it goind to download the proper the-versionable-derive crate)

Maybe I can just move it up a level into tfhe-rs/utils/tfhe-versionable-derive ?

@tmontaigu
Copy link
Contributor

I tried with cargo package --list, seems like the derive crate sources is not included.

It may still be worth to move the crate up a level as its a crate not a module

@IceTDrinker
Copy link
Member

will need a rebase the action fix has been merged in release/0.6.x as it was needed

}

/// This derives the `Versionize` and `Unversionize` trait for the target type. This macro
/// has a mandatory parameter, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems unfinished ?

if let Some(target) = &self.from {
quote! { #target::unversionize(#arg_name).into() }
} else if let Some(target) = &self.try_from {
quote! { #target::unversionize(#arg_name).try_into().unwrap() } // TODO: handle errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be handled now ?

self.orig_type.variants.len()
}

/// Returns the latest version of the original type, which is the last variant in the enum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the macro test that variants are correctly ordered ? to catch errors like enum ThingVersion { V0(..), V2(...), V1(..)} ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants