Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] serializeJavaObject(...) is faster than serialize(...) #1537

Open
1 of 2 tasks
Neiko2002 opened this issue Apr 17, 2024 · 4 comments
Open
1 of 2 tasks

[Java] serializeJavaObject(...) is faster than serialize(...) #1537

Neiko2002 opened this issue Apr 17, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Neiko2002
Copy link

Neiko2002 commented Apr 17, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Version: 0.5.0-SNAPSHOT
OS: Windows
JDK: 22

Component(s)

Java

Minimal reproduce step

public static void main(String[] args) throws Exception {
	final byte[] data = new byte[406991872];

	int runs = 5;
	Fury fury = Fury.builder().withLanguage(Language.JAVA)
			.withRefTracking(false)
			.requireClassRegistration(true)
			.withNumberCompressed(false)
			.withStringCompressed(false)
			.build();

	{
		try (Output output = new Output(Files.newOutputStream(furyFile), 1024*1024)) {
			fury.serializeJavaObject(output, data); // warmup
		}			
		long start = System.currentTimeMillis();
		for (int i = 0; i < runs; i++) {	
			try (Output output = new Output(Files.newOutputStream(furyFile), 1024*1024)) {
				fury.serializeJavaObject(output, data);
			}
		}
		System.out.println("serializeJavaObject took "+(System.currentTimeMillis()-start)+"ms");
	}

	{
		try (Output output = new Output(Files.newOutputStream(furyFile), 1024*1024)) {
			fury.serialize(output, data); // warmup
		}			
		long start = System.currentTimeMillis();
		for (int i = 0; i < runs; i++) {				
			try (Output output = new Output(Files.newOutputStream(furyFile), 1024*1024)) {
				fury.serialize(output, data);
			}
		}
		System.out.println("serialize took "+(System.currentTimeMillis()-start)+"ms");
	}
}

What did you expect to see?

I would expect the serializeJavaObject(...) runs a little bit slower because of the additional class information which need to be stored, but the method is actually faster than serialize(...). In this example its only 10% difference. For a more complex project we get 25% faster serialization times for serializeJavaObject(...) than serialize(...)

What did you see instead?

18:26:27.900 [main] INFO org.apache.fury.Fury -- Created new fury org.apache.fury.Fury@52aa2946
serializeJavaObject took 7352ms
serialize took 7849ms

Anything Else?

This is not a perfect benchmark (no JMH), but no matter how many additional runs we add to the test or if we change the execution order the results are always the same.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@Neiko2002 Neiko2002 added the bug Something isn't working label Apr 17, 2024
@chaokunyang
Copy link
Collaborator

serializeJavaObject won't store classinfo of outermost object, so it would be faster. But The difference between serializeJavaObject and serialize should be small, since it only save a classinto write for passed object type

@heliang666s
Copy link

heliang666s commented Apr 18, 2024

The difference between the two is very small, and it is likely that it is due to error. I copied your code and tested it, and the object was a 2M size image. The number of cycles was increased in turn, and it was found that there was not much difference between the two.When cycles is small, serializeJavaObject is faster,but But when the number of cycles is large, the situation is reversed.

@Neiko2002
Copy link
Author

Neiko2002 commented Apr 24, 2024

In my original question, I used a class not included in the Java JDK, making it hard for others to understand the behavior. Unfortunately, the BufferedOutputStream class from the JDK is very slow in this use case, which is why I used the Output class from Kryo. Below, I provide a standalone program without external dependencies, where I've copied the important parts of the Kryo Output class.

import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.fury.Fury;
import org.apache.fury.config.Language;

public class FuryRegressionBench {

	public static final Path furyFile = Paths.get("test.fury");
	
	public static final int warumup = 3;
	public static final int runs = 5;
	public static final byte[] data = new byte[406991872];

	public static void main(String[] args) throws Exception {

		Fury fury = Fury.builder().withLanguage(Language.JAVA)
				.withRefTracking(false)
				.requireClassRegistration(true)
				.withNumberCompressed(false)
				.withStringCompressed(false)
				.build();
		
		interface SerializeFunc {
			public void execute(OutputStream output) throws IOException; 
			
			public static void run(String funcName, SerializeFunc func)  {
				try {			
					for (int i = 0; i < warumup; i++) {				
						try (OutputStream output = new BufferedOutput(Files.newOutputStream(furyFile), 1024*1024)) {
							func.execute(output);
						}
					}
					
					long start = System.currentTimeMillis();
					for (int i = 0; i < runs; i++) {				
						try (OutputStream output = new BufferedOutput(Files.newOutputStream(furyFile), 1024*1024)) {
							func.execute(output);
						}
					}
					System.out.println(funcName+" took "+(System.currentTimeMillis()-start)+"ms");
				} catch (Exception e) {
					e.printStackTrace();
				}
			}
		}

		SerializeFunc.run("BufferedOutput", (output) -> output.write(data));
		SerializeFunc.run("serializeJavaObject", (output) -> fury.serializeJavaObject(output, data));
		SerializeFunc.run("serialize", (output) -> fury.serialize(output, data));
	}
	
	/**
	 * Extract of kryo Output.java
	 */
	public static class BufferedOutput extends OutputStream implements AutoCloseable {

		// Maximum reasonable array length. See: https://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size
		public static final int maxArraySize = Integer.MAX_VALUE - 8;

		private int maxCapacity;
		private int position;
		private int capacity;
		private byte[] buffer;
		private OutputStream outputStream;

		/** Creates a new Output for writing to a byte[].
		 * @param bufferSize The initial size of the buffer.
		 * @param maxBufferSize If {@link #flush()} does not empty the buffer, the buffer is doubled as needed until it exceeds
		 *           maxBufferSize and an exception is thrown. Can be -1 for no maximum. */
		public BufferedOutput (int bufferSize, int maxBufferSize) {
			if (bufferSize > maxBufferSize && maxBufferSize != -1) throw new IllegalArgumentException(
					"bufferSize: " + bufferSize + " cannot be greater than maxBufferSize: " + maxBufferSize);
			if (maxBufferSize < -1) throw new IllegalArgumentException("maxBufferSize cannot be < -1: " + maxBufferSize);
			this.capacity = bufferSize;
			this.maxCapacity = maxBufferSize == -1 ? maxArraySize : maxBufferSize;
			buffer = new byte[bufferSize];
		}

		/** Creates a new Output for writing to an OutputStream with the specified buffer size. */
		public BufferedOutput (OutputStream outputStream, int bufferSize) {
			this(bufferSize, bufferSize);
			if (outputStream == null) throw new IllegalArgumentException("outputStream cannot be null.");
			this.outputStream = outputStream;
		}

		/** Flushes the buffered bytes. The default implementation writes the buffered bytes to the {@link #getOutputStream()
		 * OutputStream}, if any, and sets the position to 0. Can be overridden to flush the bytes somewhere else. */
		public void flush() throws IOException {
			if (outputStream == null) return;
			try {
				outputStream.write(buffer, 0, position);
				outputStream.flush();
			} catch (IOException ex) {
				throw new IOException(ex);
			}
			position = 0;
		}

		/** Ensures the buffer is large enough to read the specified number of bytes.
		 * @return true if the buffer has been resized. */
		protected boolean require(int required) throws IOException {
			if (capacity - position >= required) return false;
			flush();
			if (capacity - position >= required) return true;
			if (required > maxCapacity - position) {
				if (required > maxCapacity)
					throw new IOException("Buffer overflow. Max capacity: " + maxCapacity + ", required: " + required);
				throw new IOException(
						"Buffer overflow. Available: " + (maxCapacity - position) + ", required: " + required);
			}
			if (capacity == 0) capacity = 16;
			do {
				capacity = Math.min(capacity * 2, maxCapacity);
			} while (capacity - position < required);
			byte[] newBuffer = new byte[capacity];
			System.arraycopy(buffer, 0, newBuffer, 0, position);
			buffer = newBuffer;
			return true;
		}

		@Override
		public void write(int value) throws IOException {
			if (position == capacity) require(1);
			buffer[position++] = (byte)value;
		}		

		/** Writes the bytes. Note the number of bytes is not written. */
		public void write(byte[] bytes) throws IOException {
			if (bytes == null) throw new IllegalArgumentException("bytes cannot be null.");
			writeBytes(bytes, 0, bytes.length);
		}

		/** Writes the bytes. Note the number of bytes is not written. */
		public void write(byte[] bytes, int offset, int length) throws IOException {
			writeBytes(bytes, offset, length);
		}

		/** Writes the bytes. Note the number of bytes is not written. */
		public void writeBytes(byte[] bytes, int offset, int count) throws IOException {
			if (bytes == null) throw new IllegalArgumentException("bytes cannot be null.");
			int copyCount = Math.min(capacity - position, count);
			while (true) {
				System.arraycopy(bytes, offset, buffer, position, copyCount);
				position += copyCount;
				count -= copyCount;
				if (count == 0) return;
				offset += copyCount;
				copyCount = Math.min(Math.max(capacity, 1), count);
				require(copyCount);
			}
		}

		/** Flushes any buffered bytes and closes the underlying OutputStream, if any. */
		@Override
		public void close() throws IOException {
			flush();
			if (outputStream != null) {
				try {
					outputStream.close();
				} catch (IOException ignored) {
				}
			}
		}
	}
}

This snippet could help to investigate the performance differences in a scenario where a file is written to disk. On an old HDD I currently get these numbers:

BufferedOutput took 1847ms
serializeJavaObject took 2231ms
serialize took 2756ms

@chaokunyang
Copy link
Collaborator

I run yoyr code locally, here is my result:

2024-04-24 08:12:19 INFO  Fury:144 [main] - Created new fury org.apache.fury.Fury@dfd3711
BufferedOutput took 1275ms
serializeJavaObject took 1975ms
serialize took 1784ms

serializeJavaObject/serialize should have similar result, there are not so much difference in those two methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants