Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert 8e20e1ee (#4117) to fix hang in destruction of groupconsumer #4667

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Quuxplusone
Copy link
Contributor

We observed that destroying a groupconsumer would often hang waiting for the broker thread to exit. We tediously bisected the problem to the specific commit 8e20e1e (the last commit before the v2.0.0rc1 tag). Only then did we find that a lot of people on GitHub were already complaining about that commit as introducing a resource leak: the commit adds a call to rd_kafka_toppar_keep that bumps the refcount of the toppar, and I don't immediately see a corresponding rd_kafka_toppar_destroy anywhere.

Reverting 8e20e1e (as in this commit) does fix the hang in groupconsumer destruction which we were observing, so we've applied this patch to our downstream library.

Fixes #4486.

…pconsumer

We observed that destroying a groupconsumer would often hang waiting for
the broker thread to exit. We tediously bisected the problem to
the specific commit 8e20e1e (the last commit before the v2.0.0rc1 tag).
Only then did we find that a lot of people on GitHub were already complaining
about that commit as introducing a resource leak: the commit adds a call to
`rd_kafka_toppar_keep` that bumps the refcount of the toppar, and I don't
immediately see a corresponding `rd_kafka_toppar_destroy` anywhere.

Reverting 8e20e1e (as in this commit) does fix the hang in groupconsumer
destruction which we were observing, so we've applied this patch
to our downstream library.

Fixes confluentinc#4486.
@emasab
Copy link
Collaborator

emasab commented Apr 1, 2024

Hello @Quuxplusone , thanks for investigating this issue, the solution isn't reverting the commit as you see there were failing tests that were fixed.
The rd_kafka_toppar_destroy is usually called here.

But that happens when the op is destroyed, maybe there are cases where the BARRIER op isn't destroyed. I have found a similar refcnt issue in test 0113, subtest n_wildcard, but happening sporadically, and there a topic is deleted. Does it happen to you when a topic is deleted too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C++ consumer APIs have memory leaks under certain conditions
2 participants