Document that only trusted data should be deserialized #270

Marcono1234 · 2021-12-13T21:57:27Z

It appears many of the classes which implement Serializable allocate memory based on a size value specified in the deserialized data, e.g. read an int as size value and create an array of that size. I assume this is done intentionally for good performance; however, it should be documented. Adversaries can abuse this to cause Denial of Service attacks. For example the Guava library received CVE-2018-10237 for the same issue. (I just want to point out here that this can be considered a security issue; I am not asking for fastutil to get a CVE or to change the implementation.)

It would therefore be good to at least in the README and the Javadoc main page describe that the deserialization implementation is designed for efficiency and should not be used for untrusted data.

The text was updated successfully, but these errors were encountered:

vigna · 2022-02-04T08:38:59Z

OK, but most users would not even understand such a specific comment. Moreover, the advisor talks about "reasonable" sizes and fastutil is designed for very large data (e.g., BigArrayBigList), so it is not clear to me what the solution could be. You can always inject perfectly valid data that is too big.

Marcono1234 · 2022-02-04T15:02:17Z

You can always inject perfectly valid data that is too big.

Yes, that is not my concern. Serialization data which legitimately contains GBs of data is perfectly fine. It is the responsibility of the user then to limit the size of serialization data. My concern is that fastutil classes, such as IntArrayList allocate memory eagerly based on the encoded collection size, see related linked Guava CVE. Due to this, an adversary can with a few KBs of serialization data make fastutil allocate GBs of data before it even notices that the serialization data does not actually contain that many collection elements. For example:

/*
// Uses https://github.com/Marcono1234/serial-builder
byte[] serialData = SimpleSerialBuilder.startSerializableObject()
    .beginClassData("it.unimi.dsi.fastutil.ints.IntArrayList", -7046029254386353130L)
        .primitiveIntField("size", Integer.MAX_VALUE - 10)
        .writeObjectWith(writer -> {
        })
    .endClassData()
.endObject();
*/
byte[] serialData = {
    -84, -19, 0, 5, 115, 114, 0, 39, 105, 116, 46, 117, 110, 105, 109, 105, 46, 100, 115,
    105, 46, 102, 97, 115, 116, 117, 116, 105, 108, 46, 105, 110, 116, 115, 46, 73, 110,
    116, 65, 114, 114, 97, 121, 76, 105, 115, 116, -98, 55, 121, -71, 127, 74, 124, 22,
    3, 0, 1, 73, 0, 4, 115, 105, 122, 101, 120, 112, 127, -1, -1, -11, 120
};

new ObjectInputStream(new ByteArrayInputStream(serialData)).readObject();

Here the serialization data claims the size of the IntArrayList is Integer.MAX_VALUE - 10, and fastutil happily allocates an array of that size (causing OutOfMemoryError: Java heap space), before it would even notice that the serialization data does not actually contain any list elements.

This might be acceptable, given that the primary goal of this library is probably performance, and not pre-sizing the array would not be as efficient. However, users should be informed that the should not blindly deserialize classes of this library.

Alternatively you can of course implement deserialization in a safer way; this would also protect users which are not explicitly deserializing fastutil classes, but where fastutil is on the classpath, and therefore an adversary can reference its classes in the serialization data. But I cannot really suggest what "reasonable" initial collection sizes are, and getting this right for all corner cases is tricky.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that only trusted data should be deserialized #270

Document that only trusted data should be deserialized #270

Marcono1234 commented Dec 13, 2021

vigna commented Feb 4, 2022

Marcono1234 commented Feb 4, 2022

Document that only trusted data should be deserialized #270

Document that only trusted data should be deserialized #270

Comments

Marcono1234 commented Dec 13, 2021

vigna commented Feb 4, 2022

Marcono1234 commented Feb 4, 2022