Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that only trusted data should be deserialized #270

Open
Marcono1234 opened this issue Dec 13, 2021 · 2 comments
Open

Document that only trusted data should be deserialized #270

Marcono1234 opened this issue Dec 13, 2021 · 2 comments

Comments

@Marcono1234
Copy link

It appears many of the classes which implement Serializable allocate memory based on a size value specified in the deserialized data, e.g. read an int as size value and create an array of that size. I assume this is done intentionally for good performance; however, it should be documented. Adversaries can abuse this to cause Denial of Service attacks. For example the Guava library received CVE-2018-10237 for the same issue. (I just want to point out here that this can be considered a security issue; I am not asking for fastutil to get a CVE or to change the implementation.)

It would therefore be good to at least in the README and the Javadoc main page describe that the deserialization implementation is designed for efficiency and should not be used for untrusted data.

@vigna
Copy link
Owner

vigna commented Feb 4, 2022

OK, but most users would not even understand such a specific comment. Moreover, the advisor talks about "reasonable" sizes and fastutil is designed for very large data (e.g., BigArrayBigList), so it is not clear to me what the solution could be. You can always inject perfectly valid data that is too big.

@Marcono1234
Copy link
Author

You can always inject perfectly valid data that is too big.

Yes, that is not my concern. Serialization data which legitimately contains GBs of data is perfectly fine. It is the responsibility of the user then to limit the size of serialization data. My concern is that fastutil classes, such as IntArrayList allocate memory eagerly based on the encoded collection size, see related linked Guava CVE. Due to this, an adversary can with a few KBs of serialization data make fastutil allocate GBs of data before it even notices that the serialization data does not actually contain that many collection elements. For example:

/*
// Uses https://github.com/Marcono1234/serial-builder
byte[] serialData = SimpleSerialBuilder.startSerializableObject()
    .beginClassData("it.unimi.dsi.fastutil.ints.IntArrayList", -7046029254386353130L)
        .primitiveIntField("size", Integer.MAX_VALUE - 10)
        .writeObjectWith(writer -> {
        })
    .endClassData()
.endObject();
*/
byte[] serialData = {
    -84, -19, 0, 5, 115, 114, 0, 39, 105, 116, 46, 117, 110, 105, 109, 105, 46, 100, 115,
    105, 46, 102, 97, 115, 116, 117, 116, 105, 108, 46, 105, 110, 116, 115, 46, 73, 110,
    116, 65, 114, 114, 97, 121, 76, 105, 115, 116, -98, 55, 121, -71, 127, 74, 124, 22,
    3, 0, 1, 73, 0, 4, 115, 105, 122, 101, 120, 112, 127, -1, -1, -11, 120
};

new ObjectInputStream(new ByteArrayInputStream(serialData)).readObject();

Here the serialization data claims the size of the IntArrayList is Integer.MAX_VALUE - 10, and fastutil happily allocates an array of that size (causing OutOfMemoryError: Java heap space), before it would even notice that the serialization data does not actually contain any list elements.

This might be acceptable, given that the primary goal of this library is probably performance, and not pre-sizing the array would not be as efficient. However, users should be informed that the should not blindly deserialize classes of this library.

Alternatively you can of course implement deserialization in a safer way; this would also protect users which are not explicitly deserializing fastutil classes, but where fastutil is on the classpath, and therefore an adversary can reference its classes in the serialization data. But I cannot really suggest what "reasonable" initial collection sizes are, and getting this right for all corner cases is tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants