Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

Open
relayr-huzaifah opened this issue Mar 31, 2023 · 3 comments

Comments

@relayr-huzaifah
Copy link

Bug Report:

  • Actual behavior

We are trying to capture events from Azure Event Hub and save them in Datalake after some processing using Databricks Workflow. The workflow works fine for dev and stg environments but when trying to run it on prd (with different Event Hub, Databricks and Datalake Account) with exactly same configurations and JARs, it gives the following error:
In partition 7 of <event-hub-name>, with consumer group <consumer-group-name>, request seqNo 210 is less than the received seqNo 198. The earliest seqNo is 210, the last seqNo is 210, and received seqNo 198.
The total number of partitions in the event-hub is 8 and there are total 8 event-hubs (including this one) in the namespace, all with Message retention of 7 days.

  • Expected behavior

The data gets processed and sent to the Datalake as it does for other environments.

  • Spark version

Spark version is 2.12.

  • spark-eventhubs artifactId and version

artifactId: azure-eventhubs-spark_2.12
version: 2.3.22

@relayr-huzaifah
Copy link
Author

The error doesn't make any sense since it says that request seqNo 210 is less than the received seqNo 198. Although 210 is literally not less than 198.

@relayr-huzaifah
Copy link
Author

After looking at the messages that the job is processing using EventPosition.earliest(), it was found out that the event hub is giving messages with sequence numbers between 188 and 194. However, it's important to note that messages with lower sequence numbers than 194 were added to the event hub about 2 weeks ago, and the event hub is set to keep messages for only 7 days, so all but one of those messages should have expired by now. This is also supported by the error message from the job, which sets up sequence number 194 as the starting point, but then complains that it received sequence number 188.

However, the error itself seems strange, because it says that a larger sequence number (194) is considered smaller than a smaller sequence number (188).

@yamin-msft
Copy link
Contributor

Do you have a repro and a spark log of the repro?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants