Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35215][core] Fix the bug when Kryo serialize length is 0 #24717

Merged
merged 2 commits into from May 22, 2024

Conversation

1996fanrui
Copy link
Member

@1996fanrui 1996fanrui commented Apr 25, 2024

What is the purpose of the change

The performance of serializerKryo and serializerKryoWithoutRegistration are regressed, I checked recent commits, and found FLINK-34954 changed related logic.

Brief change log

  • The first commit: Revert "[FLINK-34954][core] Kryo Input bug fix"
  • The second commit: [FLINK-35215][core] Fix the bug when Kryo serialize length is 0
    • This PR reverts the FLINK-34954, and try to re-implement to avoid the performance regression.

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 25, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@reswqa
Copy link
Member

reswqa commented Apr 25, 2024

Not to block this, just curious do you know why FLINK-34954 affects performance?

@1996fanrui 1996fanrui marked this pull request as draft April 25, 2024 04:40
Comment on lines -121 to +137
while (bytesRead < count) {
while (true) {
c = inputStream.read(bytes, offset + bytesRead, count - bytesRead);

if (c == -1) {
throw new KryoException(new EOFException("No more bytes left."));
}

bytesRead += c;

if (bytesRead == count) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to block this, just curious do you know why FLINK-34954 affects performance?

Hi @reswqa , I'm not very sure the reason. But I guess:

  • For non-empty data, let us assume inputStream.read will be called once.
    • Before FLINK-34954, we only check bytesRead == count once.
    • After FLINK-34954, we call bytesRead < count twice.
  • Also, I'm not sure whether FLINK-34954 breaks any JIT optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, I don't know the details also, but it's possible that JIT optimizations were affected.

Re-implement to avoid the performance regression
Comment on lines +120 to +123
if (count == 0) {
return;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if (count == 0) { is alternative change for FLINK-34954, I ran the demo that provided in FLINK-34954, it works well. So I think the change is fine.

Also, I ran the benchmark, the performance of serializerKryo and serializerKryoWithoutRegistration are recovered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dannycranmer , this issue is the blocker of Flink 1.20, and you are the reviewer of #24586, would you mind reviewing this PR as well? thanks in advance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1996fanrui Can you add the demo from FLINK-34954 as a test?

@1996fanrui 1996fanrui marked this pull request as ready for review May 8, 2024 02:53
@kkrugler
Copy link
Contributor

kkrugler commented May 8, 2024

I'm leaving in 2 days for a 4+ month backpack trip (hiking the PCT), otherwise what I'd do is try to hook up a test that uses JMH to accurately assess the impact of the change. To respond to the particular comment on why performance changed:

we call bytesRead < count twice

The overhead of two comparisons, versus even one call to inputStream.read, should be miniscule.

I'm not sure whether FLINK-34954 breaks any JIT optimization

If a JIT optimization would result in skipping calls to inputStream.read, then yes that could have an impact. But that would be a change in logic.

The one thing I see is in the change to readBytes(), where if it's often called with a count of 0, then your change skips the try/catch block, which is non-trivial. But I thought the fix was to avoid a failure for this exact case, so it shouldn't be happening in the current benchmark code.

@1996fanrui
Copy link
Member Author

I'm leaving in 2 days for a 4+ month backpack trip (hiking the PCT)

Wow, have a nice trip!

To respond to the particular comment on why performance changed:

we call bytesRead < count twice

The overhead of two comparisons, versus even one call to inputStream.read, should be miniscule.

I'm not sure whether FLINK-34954 breaks any JIT optimization

If a JIT optimization would result in skipping calls to inputStream.read, then yes that could have an impact. But that would be a change in logic.

As I said before, I'm not sure the reason, and both of them are my guess.

The one thing I see is in the change to readBytes(), where if it's often called with a count of 0, then your change skips the try/catch block, which is non-trivial. But I thought the fix was to avoid a failure for this exact case, so it shouldn't be happening in the current benchmark code.

Sorry, I didn't fully understand this sentence. Do you mean skip the try/catch block is not fine?

@kkrugler
Copy link
Contributor

Sorry for the confusing sentence. I meant that skipping the try/catch block would save a lot of cycles, but your change that causes it to be (correctly) skipped when the count is 0 shouldn't be the common case, so I wouldn't expect that to impact the benchmark results.

@1996fanrui
Copy link
Member Author

Sorry for the confusing sentence. I meant that skipping the try/catch block would save a lot of cycles, but your change that causes it to be (correctly) skipped when the count is 0 shouldn't be the common case, so I wouldn't expect that to impact the benchmark results.

Thanks for the quick clarification! Yes, my change only adds a simple check if (count == 0) return;, and it doesn't impact performance after I run the benchmark.

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving in 2 days for a 4+ month backpack trip (hiking the PCT), otherwise what I'd do is try to hook up a test that uses JMH to accurately assess the impact of the change. To respond to the particular comment on why performance changed:

we call bytesRead < count twice

The overhead of two comparisons, versus even one call to inputStream.read, should be miniscule.

I agree that some explanation is missing here. While I would be also curious to understand all of this better, I'm fine if this is merged as is. In the absence of a better performance indicator, I would relay on our imperfect benchmarking results, which suggests we should merge this one over the current master.

We can also continue investigation asynchronously after merging this fix if someone is interested in doing that.

Copy link
Member Author

@1996fanrui 1996fanrui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pnowojski for the comment and review!

Let me merge this PR first, and I will follow the benchmark in the following days.

@1996fanrui 1996fanrui merged commit c1baf07 into apache:master May 22, 2024
@1996fanrui 1996fanrui deleted the 35215/kyro-handle-length0 branch May 22, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants