Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventHub Writer fails due to Throttling of EventHub, configuration settings have no impact. #679

Open
steffenmarschall opened this issue Jun 26, 2023 · 1 comment

Comments

@steffenmarschall
Copy link

Bug Report:

Actual Behavior

We have a rather huge streaming Dataframe (42.000.000 rows) which we want to send to our Azure Eventhub. The EventHub is scaled with 15 TUs.
However any run trying to send this data fails, due to throttling of EventHub. The exception that is being shown is:

StreamingQueryException: [STREAM_FAILED] Query [id = ..., runId = ...] terminated with exception: Job aborted due to stage failure: Task XX in stage 9.0 failed 4 times, most recent failure: Lost task 61.3 in stage 9.0 (TID 1963) (10.179.0.21 executor 7): com.microsoft.azure.eventhubs.ServerBusyException: The request was terminated because the entity is being throttled. Error code : 50002. Sub error : 101. Please wait 4 seconds and try again. To know more visit https://aka.ms/sbResourceMgrExceptions and https://aka.ms/ServiceBusThrottling

We tried to lower the sending rate with the following options:

  • maxEventsPerTrigger (i.e. to 100)
  • eventhubs.threadPoolSize (i.e. to 1)
  • eventhubs.operationTimeout (i.e. to 15 minutes)

However none of these had any measureable impact on the Sending Rate to the EventHub.

Additional Info:

We stream from a DeltaTable, each version has usually ~42.000.000 added rows.
We use the AvailableNow Trigger and try to checkpoint. However the job usually fails before reaching any checkpoint.

Expected behavior

Adjusting the settings will lower/increase throughput when writing to Azure Event Hub.

Please let us know on how to configure the EventHubWriter so we are able to send large data without failing due to throttling.

Configuration

  • Databricks/Spark version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
  • spark-eventhubs artifactId and version: com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22
@YoshicoppensE61
Copy link

I have the same problem. I do not think we are armed with any config settings that can help in this scenario, to limit the rate of output, which makes eventhub as an output a bit worthless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants