Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReceiverDisconnectedException even if using different consumer groups #680

Open
HaowenZhangBD opened this issue Aug 17, 2023 · 1 comment

Comments

@HaowenZhangBD
Copy link

Hi team, we have seen the ReceiverDisconnectedException in our databricks env and done some research.
Found other people have similar problem and solved in these 2 docs

https://github.com/Azure/azure-event-hubs-spark/blob/master/FAQ.md
https://github.com/Azure/azure-event-hubs-spark/blob/master/examples/multiple-readers-example.md

We have read through them and follow the suggestions of using different consumer groups for different stream.
But we still get ReceiverDisconnectedException on both of the stream in the Similar timestamp
image

Bug Report:

  • Actual behavior

stream 1 using PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5065.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5065.0 (TID 91438) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-87' with higher epoch of '0' is created hence current receiver 'spark-driver-87' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040004c849000eef7c64ddd416_G27_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|job1, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0, REFERENCE_ID: LN_a37906_1692259345344_1af_G27, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]

stream 2 using PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5069.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5069.0 (TID 91503) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-315' with higher epoch of '0' is created hence current receiver 'spark-driver-315' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040006c849000eef5c64ddd416_G2_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|machine2, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0, REFERENCE_ID: LN_190e6e_1692259345190_e97a_G2, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]

  • Expected behavior : no ReceiverDisconnectedException
  • spark-eventhubs artifactId and version : com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22
  • Spark version
    image
@HaowenZhangBD
Copy link
Author

Maybe worth mention: Another Environment, applying the same code change, didn't have ReceiverDisconnectedException after running for around 1 day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant