-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure Event Hub Consumer Group and Partition Id issue with spark streaming #655
Comments
I got the same issue while I use this library in Azure Data bricks. 23/10/10 09:22:18 INFO EventHubsRDD: (TID 578) Computing EventHubs test, partition 2 sequence numbers 5 => 6 compute didn't filter the partition based on the ehConf configuration. It still receives events from all partitions. Please help fix it. |
Hi @yamin-msft, can you please help with this ticket? |
I have a use case where I need to consume the 1 million events per second from Event Hub using spark streaming. I have created Event Hub with 10 partitions and 10 consumer groups to read the events in parallel using 10 spark streaming jobs for 1 CG per Partitions. Here the problem is each consumer reads all the events from all the partitions which create an issue of duplicate data. Ideally It should read the events from specified partition. I think It is a bug. I am using data bricks with PySpark streaming to consume the events Please help how to resolve the issue.
Feature Requests:
Bug Report:
The text was updated successfully, but these errors were encountered: