Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coop sticky algo on large partition number #4629

Open
5 of 7 tasks
ericwuseattle opened this issue Feb 29, 2024 · 8 comments
Open
5 of 7 tasks

coop sticky algo on large partition number #4629

ericwuseattle opened this issue Feb 29, 2024 · 8 comments

Comments

@ericwuseattle
Copy link

ericwuseattle commented Feb 29, 2024

Description

There are 2 issues, I noticed on kafka coop sticky mode.

  1. The hard code on partition_cnt inside rd_kafka_sticky_assignor_assign_cb
    https://github.com/confluentinc/librdkafka/blob/master/src/rdkafka_sticky_assignor.c#L1834

  2. On 3K partitions, it's working without issue, but if I increase the partition to 6K with fresh topic(I mean recreate the topic as new one). Have to increase the session.timeout.ms=10000 and max.poll.interval.ms=10000 from 3s to 10s to make it working.
    Otherwise will get kicked out from grp
    Broker logs:
    Member XXX-6F958DDF5F-CDXRQ~-0793c679-d5ef-4753-9056-7da314e1415b in group XXX-TOPIC-NAME-XXX has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator).

I'm not sure what's causing the timeout, but I'm sure we keep calling kafka poll in a timer infinitely. So increase into 10s it's working without any issue.

Trying further on 15K partitions with 10s, no lucky, would not work, gets kicked out from grp.

Overall,
3K partitions 3s timeout, works.
6K partitions 3s timeout, not work.
6K partitions 10s timeout, works.
15K partitions 10s timeout, not work.

How to reproduce

Large partition numbers.
6K partitions with 3s session timeout
or
15K partitions with 10s session timeout

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): <2.3.0>
  • Apache Kafka version: <3.0>
  • librdkafka client configuration: <fetch.min.bytes=1, fetch.wait.max.ms=500, fetch.error.backoff.ms=0, heartbeat.interval.ms=1000, enable.auto.commit=false, enable.partition.eof=false, enable.auto.offset.store=false, max.poll.interval.ms=3000, session.timeout.ms=3000, partition.assignment.strategy=cooperative-sticky>
  • Operating system: Ubuntu(x64)>
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@ericwuseattle ericwuseattle changed the title coop sticky algo on large partition numbers coop sticky algo on large partition number Feb 29, 2024
@ericwuseattle
Copy link
Author

Any thoughts on this problem? Are more details needed?

@emasab
Copy link
Collaborator

emasab commented May 9, 2024

@ericwuseattle could you send some logs with debug=all . It's possible that it's needed to increase those values for a rebalance with those many partitions, but from the logs we can see where most of the time goes.

@ericwuseattle
Copy link
Author

unfortunately I do not have the evn was set up to testing in my hand right now, have you checked the hard code of partition count in the code, if we could fix that part first then I'll find time to retry it.

/* FIXME: Let the cgrp pass the actual eligible partition count /
size_t partition_cnt = member_cnt * 10; /
FIXME */

https://github.com/confluentinc/librdkafka/blob/master/src/rdkafka_sticky_assignor.c#L1834

@emasab
Copy link
Collaborator

emasab commented May 17, 2024

Given your configuration you're not using the sticky assignor as
fetch.min.bytes=1, fetch.wait.max.ms=500, fetch.error.backoff.ms=0, heartbeat.interval.ms=1000, enable.auto.commit=false, enable.partition.eof=false, enable.auto.offset.store=false, max.poll.interval.ms=3000, session.timeout.ms=3000

as the default partition.assignment.strategy doesn't include cooperative-sticky could you update your configuration if you're setting it?

@emasab
Copy link
Collaborator

emasab commented May 17, 2024

@ericwuseattle

/* FIXME: Let the cgrp pass the actual eligible partition count /
size_t partition_cnt = member_cnt * 10; / FIXME */

that is just the estimated partition count used for the the initial size of maps and lists. You can try to increase the multiplier and see if it changes something and send some logs of the leader and 2-3 random members.

@ericwuseattle
Copy link
Author

ericwuseattle commented May 17, 2024

Given your configuration you're not using the sticky assignor as fetch.min.bytes=1, fetch.wait.max.ms=500, fetch.error.backoff.ms=0, heartbeat.interval.ms=1000, enable.auto.commit=false, enable.partition.eof=false, enable.auto.offset.store=false, max.poll.interval.ms=3000, session.timeout.ms=3000

as the default partition.assignment.strategy doesn't include cooperative-sticky could you update your configuration if you're setting it?

Sorry, did not give you all the config, but we did have the setting for
partition.assignment.strategy=cooperative-sticky
I'll update the conf in checklist.

@emasab
Copy link
Collaborator

emasab commented May 21, 2024

@ericwuseattle thanks, other helpful info is:

  • if you have a rebalance callback set. In that case please test without it
  • how many members are there in the group, and if they're all subscribed to that topic with 3-6-15K partitions or to other topics too

@ericwuseattle
Copy link
Author

ericwuseattle commented May 22, 2024

I'm not sure what's causing the timeout, but I'm sure we keep calling kafka poll in a timer infinitely. So increase into 10s it's working without any issue.

You have to set the callback to call the incremental partitions assing/revoke client api, beside that we have some our internal logic but only by posting task way, so would not block or cost much of cpu in the kafka callback worker thread.

Trying further on 15K partitions with 10s, no lucky, would not work, gets kicked out from grp.

30 members totally, only 1 topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants