Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

relayr-huzaifah · 2023-03-31T15:18:28Z

Bug Report:

Actual behavior

We are trying to capture events from Azure Event Hub and save them in Datalake after some processing using Databricks Workflow. The workflow works fine for dev and stg environments but when trying to run it on prd (with different Event Hub, Databricks and Datalake Account) with exactly same configurations and JARs, it gives the following error:
In partition 7 of <event-hub-name>, with consumer group <consumer-group-name>, request seqNo 210 is less than the received seqNo 198. The earliest seqNo is 210, the last seqNo is 210, and received seqNo 198.
The total number of partitions in the event-hub is 8 and there are total 8 event-hubs (including this one) in the namespace, all with Message retention of 7 days.

Expected behavior

The data gets processed and sent to the Datalake as it does for other environments.

Spark version

Spark version is 2.12.

spark-eventhubs artifactId and version

artifactId: azure-eventhubs-spark_2.12
version: 2.3.22

The text was updated successfully, but these errors were encountered:

relayr-huzaifah · 2023-04-04T09:24:57Z

The error doesn't make any sense since it says that request seqNo 210 is less than the received seqNo 198. Although 210 is literally not less than 198.

relayr-huzaifah · 2023-04-06T09:48:26Z

After looking at the messages that the job is processing using EventPosition.earliest(), it was found out that the event hub is giving messages with sequence numbers between 188 and 194. However, it's important to note that messages with lower sequence numbers than 194 were added to the event hub about 2 weeks ago, and the event hub is set to keep messages for only 7 days, so all but one of those messages should have expired by now. This is also supported by the error message from the job, which sets up sequence number 194 as the starting point, but then complains that it received sequence number 188.

However, the error itself seems strange, because it says that a larger sequence number (194) is considered smaller than a smaller sequence number (188).

yamin-msft · 2023-05-22T20:45:21Z

Do you have a repro and a spark log of the repro?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

relayr-huzaifah commented Mar 31, 2023

relayr-huzaifah commented Apr 4, 2023

relayr-huzaifah commented Apr 6, 2023

yamin-msft commented May 22, 2023

Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

Job for consumption of Event Hub messages aborts on Databricks (request seqNo less than received seqNo) #670

Comments

relayr-huzaifah commented Mar 31, 2023

relayr-huzaifah commented Apr 4, 2023

relayr-huzaifah commented Apr 6, 2023

yamin-msft commented May 22, 2023