Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninitialize LossDetection earlier to cleanup Datagram frame #4316

Merged
merged 9 commits into from
May 23, 2024

Conversation

ami-GS
Copy link
Contributor

@ami-GS ami-GS commented May 20, 2024

Description

App allocated buffer for Datagram Frame has been leaking due to

  • missing callback handler when waiting Ack for the Datagram frame.
  • allocation failure for packet metadata

Testing

Ran local test. No leak detected. No leak found from my own memory tracker either.
./artifacts/bin/linux/x64_Debug_openssl3/spinquic both -timeout:600000 -repeat_count:20 -alloc_fail:100
./artifacts/bin/linux/x64_Debug_openssl/spinquic both -timeout:600000 -repeat_count:20 -alloc_fail:100

Let's see automation results

Documentation

N/A

@ami-GS ami-GS requested a review from a team as a code owner May 20, 2024 07:33
src/core/connection.c Outdated Show resolved Hide resolved
@nibanks
Copy link
Member

nibanks commented May 20, 2024

Since there are still leaks, even with this change, I feel like QuicDatagramSendShutdown isn't getting called before shutdown complete somehow. Can you add some debug asserts for:

  1. Before the shutdown complete event, assert that Datagram has been shutdown/disabled.
  2. Before the shutdown complete event, assert that zero outstanding datagrams are pending.

@nibanks
Copy link
Member

nibanks commented May 20, 2024

I wonder if we need to add a QuicDatagramSendShutdown after QuicSendUninitialize in this same function. Though it really should be getting called in QuicConnTryClose.

@ami-GS
Copy link
Contributor Author

ami-GS commented May 21, 2024

DatagramSend is still called after QuicDatagramSendShutdown which touches released Datagram->ApiQueueLock then crash

@ami-GS
Copy link
Contributor Author

ami-GS commented May 21, 2024

Another leak
image

    QUIC_SENT_PACKET_METADATA* SentPacket =
        QuicSentPacketPoolGetPacketMetadata(
            &Connection->Worker->SentPacketPool, TempSentPacket->FrameCount);

@ami-GS
Copy link
Contributor Author

ami-GS commented May 21, 2024

The QUIC_SENT_PACKET_METADATA should be released by QuicLossDetectionUninitialize....

@ami-GS
Copy link
Contributor Author

ami-GS commented May 21, 2024

There is still case of QUIC_DATAGRAM_SEND_SENT -> QUIC_DATAGRAM_SEND_LOST_SUSPECT -> leak

@ami-GS
Copy link
Contributor Author

ami-GS commented May 21, 2024

  • LossDetection->SentPackets
  • LossDetection->LostPackets

are cleaned by QuicLossDetectionUninitialize in QuicConnOnShutdownComplete though

Copy link

codecov bot commented May 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.06%. Comparing base (9cab5bf) to head (997bff5).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4316      +/-   ##
==========================================
- Coverage   86.04%   85.06%   -0.99%     
==========================================
  Files          56       56              
  Lines       15382    15384       +2     
==========================================
- Hits        13236    13086     -150     
- Misses       2146     2298     +152     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ami-GS
Copy link
Contributor Author

ami-GS commented May 22, 2024

Fixed!

nibanks
nibanks previously approved these changes May 22, 2024
Copy link
Member

@nibanks nibanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making these changes!

@nibanks
Copy link
Member

nibanks commented May 22, 2024

I see there are a number of BVT failures. I"m rerunning them to see if they are real regressions or not.

@ami-GS
Copy link
Contributor Author

ami-GS commented May 22, 2024

Linux XDP related is fixed by different PR

@@ -408,6 +409,7 @@ QuicLossDetectionOnPacketSent(
"Sent packet metadata",
SIZEOF_QUIC_SENT_PACKET_METADATA(TempSentPacket->FrameCount));
QuicLossDetectionRetransmitFrames(LossDetection, TempSentPacket, FALSE);
QuicLossDetectionOnPacketDiscarded(LossDetection, TempSentPacket, FALSE, FALSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked in person about a slightly different way to do this. Please update. Thanks!

@ami-GS ami-GS merged commit 38861ea into main May 23, 2024
352 of 355 checks passed
@ami-GS ami-GS deleted the dev/daiki/spin_memleak_datagram branch May 23, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants