New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] serializeJavaObject(...) is faster than serialize(...) #1537
Comments
|
The difference between the two is very small, and it is likely that it is due to error. I copied your code and tested it, and the object was a 2M size image. The number of cycles was increased in turn, and it was found that there was not much difference between the two.When cycles is small, serializeJavaObject is faster,but But when the number of cycles is large, the situation is reversed. |
In my original question, I used a class not included in the Java JDK, making it hard for others to understand the behavior. Unfortunately, the BufferedOutputStream class from the JDK is very slow in this use case, which is why I used the Output class from Kryo. Below, I provide a standalone program without external dependencies, where I've copied the important parts of the Kryo Output class. import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.fury.Fury;
import org.apache.fury.config.Language;
public class FuryRegressionBench {
public static final Path furyFile = Paths.get("test.fury");
public static final int warumup = 3;
public static final int runs = 5;
public static final byte[] data = new byte[406991872];
public static void main(String[] args) throws Exception {
Fury fury = Fury.builder().withLanguage(Language.JAVA)
.withRefTracking(false)
.requireClassRegistration(true)
.withNumberCompressed(false)
.withStringCompressed(false)
.build();
interface SerializeFunc {
public void execute(OutputStream output) throws IOException;
public static void run(String funcName, SerializeFunc func) {
try {
for (int i = 0; i < warumup; i++) {
try (OutputStream output = new BufferedOutput(Files.newOutputStream(furyFile), 1024*1024)) {
func.execute(output);
}
}
long start = System.currentTimeMillis();
for (int i = 0; i < runs; i++) {
try (OutputStream output = new BufferedOutput(Files.newOutputStream(furyFile), 1024*1024)) {
func.execute(output);
}
}
System.out.println(funcName+" took "+(System.currentTimeMillis()-start)+"ms");
} catch (Exception e) {
e.printStackTrace();
}
}
}
SerializeFunc.run("BufferedOutput", (output) -> output.write(data));
SerializeFunc.run("serializeJavaObject", (output) -> fury.serializeJavaObject(output, data));
SerializeFunc.run("serialize", (output) -> fury.serialize(output, data));
}
/**
* Extract of kryo Output.java
*/
public static class BufferedOutput extends OutputStream implements AutoCloseable {
// Maximum reasonable array length. See: https://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size
public static final int maxArraySize = Integer.MAX_VALUE - 8;
private int maxCapacity;
private int position;
private int capacity;
private byte[] buffer;
private OutputStream outputStream;
/** Creates a new Output for writing to a byte[].
* @param bufferSize The initial size of the buffer.
* @param maxBufferSize If {@link #flush()} does not empty the buffer, the buffer is doubled as needed until it exceeds
* maxBufferSize and an exception is thrown. Can be -1 for no maximum. */
public BufferedOutput (int bufferSize, int maxBufferSize) {
if (bufferSize > maxBufferSize && maxBufferSize != -1) throw new IllegalArgumentException(
"bufferSize: " + bufferSize + " cannot be greater than maxBufferSize: " + maxBufferSize);
if (maxBufferSize < -1) throw new IllegalArgumentException("maxBufferSize cannot be < -1: " + maxBufferSize);
this.capacity = bufferSize;
this.maxCapacity = maxBufferSize == -1 ? maxArraySize : maxBufferSize;
buffer = new byte[bufferSize];
}
/** Creates a new Output for writing to an OutputStream with the specified buffer size. */
public BufferedOutput (OutputStream outputStream, int bufferSize) {
this(bufferSize, bufferSize);
if (outputStream == null) throw new IllegalArgumentException("outputStream cannot be null.");
this.outputStream = outputStream;
}
/** Flushes the buffered bytes. The default implementation writes the buffered bytes to the {@link #getOutputStream()
* OutputStream}, if any, and sets the position to 0. Can be overridden to flush the bytes somewhere else. */
public void flush() throws IOException {
if (outputStream == null) return;
try {
outputStream.write(buffer, 0, position);
outputStream.flush();
} catch (IOException ex) {
throw new IOException(ex);
}
position = 0;
}
/** Ensures the buffer is large enough to read the specified number of bytes.
* @return true if the buffer has been resized. */
protected boolean require(int required) throws IOException {
if (capacity - position >= required) return false;
flush();
if (capacity - position >= required) return true;
if (required > maxCapacity - position) {
if (required > maxCapacity)
throw new IOException("Buffer overflow. Max capacity: " + maxCapacity + ", required: " + required);
throw new IOException(
"Buffer overflow. Available: " + (maxCapacity - position) + ", required: " + required);
}
if (capacity == 0) capacity = 16;
do {
capacity = Math.min(capacity * 2, maxCapacity);
} while (capacity - position < required);
byte[] newBuffer = new byte[capacity];
System.arraycopy(buffer, 0, newBuffer, 0, position);
buffer = newBuffer;
return true;
}
@Override
public void write(int value) throws IOException {
if (position == capacity) require(1);
buffer[position++] = (byte)value;
}
/** Writes the bytes. Note the number of bytes is not written. */
public void write(byte[] bytes) throws IOException {
if (bytes == null) throw new IllegalArgumentException("bytes cannot be null.");
writeBytes(bytes, 0, bytes.length);
}
/** Writes the bytes. Note the number of bytes is not written. */
public void write(byte[] bytes, int offset, int length) throws IOException {
writeBytes(bytes, offset, length);
}
/** Writes the bytes. Note the number of bytes is not written. */
public void writeBytes(byte[] bytes, int offset, int count) throws IOException {
if (bytes == null) throw new IllegalArgumentException("bytes cannot be null.");
int copyCount = Math.min(capacity - position, count);
while (true) {
System.arraycopy(bytes, offset, buffer, position, copyCount);
position += copyCount;
count -= copyCount;
if (count == 0) return;
offset += copyCount;
copyCount = Math.min(Math.max(capacity, 1), count);
require(copyCount);
}
}
/** Flushes any buffered bytes and closes the underlying OutputStream, if any. */
@Override
public void close() throws IOException {
flush();
if (outputStream != null) {
try {
outputStream.close();
} catch (IOException ignored) {
}
}
}
}
} This snippet could help to investigate the performance differences in a scenario where a file is written to disk. On an old HDD I currently get these numbers:
|
I run yoyr code locally, here is my result: 2024-04-24 08:12:19 INFO Fury:144 [main] - Created new fury org.apache.fury.Fury@dfd3711
BufferedOutput took 1275ms
serializeJavaObject took 1975ms
serialize took 1784ms serializeJavaObject/serialize should have similar result, there are not so much difference in those two methods |
Search before asking
Version
Version: 0.5.0-SNAPSHOT
OS: Windows
JDK: 22
Component(s)
Java
Minimal reproduce step
What did you expect to see?
I would expect the serializeJavaObject(...) runs a little bit slower because of the additional class information which need to be stored, but the method is actually faster than serialize(...). In this example its only 10% difference. For a more complex project we get 25% faster serialization times for serializeJavaObject(...) than serialize(...)
What did you see instead?
Anything Else?
This is not a perfect benchmark (no JMH), but no matter how many additional runs we add to the test or if we change the execution order the results are always the same.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: