Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CMAK to use Kafka 2.8.0+ libs due to critical bug discovered in Kafka #900

Open
atul008 opened this issue Jan 5, 2023 · 2 comments

Comments

@atul008
Copy link

atul008 commented Jan 5, 2023

A critical bug (https://issues.apache.org/jira/browse/KAFKA-14190) has been discovered where if we use pre-2.8.0 ZK admin clients, it corrupts topics Ids in the Kafka cluster. So using CMAK with Kafka 2.4 libs (currently CAMK is built with Kafka 2.4 libs) will cause this issue in Kafka with version 2.8.0+.

We use kafka-manager to manage our production Kafka clusters and this issue has caused some outages. Opening this issue to address the same.

Update: Updating to the latest Kafka libs won't help as CMAK uses the curator framework to update ZK instead of AdminZkClient. So we need to wait for KAFKA-14190 to be fixed.

@atul008
Copy link
Author

atul008 commented Jan 5, 2023

Steps to reproduce :

  1. Add partitions using kafka-manager (that uses pre-2.8.0 kafka client libs) to a topic with Kafka version 2.8.1 (we tested with 2.8.1, it can happen with 2.8.0+ versions )
  2. Restart the controller broker

You should see similar logs in the broker

[2022-08-25 17:44:05,308] ERROR [Broker id=0] Topic Id in memory: jKTRaM_TSNqocJeQI2aYOQ does not match the topic Id for partition myTopic-0 provided in the request: nI-JQtPwQwGiylMfm8k13w. (state.change.logger)

@atul008 atul008 changed the title Update CMAK to use Kafka 2.8.0+ libs due to critical bug discovered in Kafka 2.8.0+ versions Update CMAK to use Kafka 2.8.0+ libs due to critical bug discovered in Kafka Jan 5, 2023
@d-mankowski-synerise
Copy link

This issue messed up thousands of partitions in our production cluster - really, stop using CMAK if you do not want to have serious issues, there are much better tools (redpanda console, conduktor, etc.).

The fix was to stop kafka, delete all partition.metadata files and start kafka - then it fetches metadata from ZooKeeper (this procedure can be done one node at a time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants