Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark job doesnt stop on stopping query #672

Open
ronban opened this issue May 16, 2023 · 0 comments
Open

PySpark job doesnt stop on stopping query #672

ronban opened this issue May 16, 2023 · 0 comments

Comments

@ronban
Copy link

ronban commented May 16, 2023

Bug Report:

Scenario:

...
ehConf = {}
ehConf['eventhubs.useExclusiveReceiver'] = False
ehConf['eventhubs.connectionString'] = spark.sparkContext._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(
        connectionString)
startingEventPosition = {
        "offset": "-1",
        "seqNo": -1,  # not in use
        "enqueuedTime": None,  # not in use
        "isInclusive": True
}
endingEventPosition = {
        "offset": "@latest",
        "seqNo": -1,
        "enqueuedTime": None,
        "isInclusive": True
}
ehConf["eventhubs.startingPosition"] = dumps(startingEventPosition)
ehConf["eventhubs.endingPosition"] = dumps(endingEventPosition)


qs=spark.readStream.format("eventhubs")\
            .options(**ehConf)\
            .load()\
            .writeStream\
            .foreachBatch(lambda bdf,bid: my_func(bdf,bid,schema,storage_format,coalesced_output))\
            .option("checkpointLocation", checkpoint)\
            .start()

    monitor_thread = threading.Thread(target=monitor,args=(qs,))
    monitor_thread.start()
    monitor_thread.join()

    qs.awaitTermination()

def monitor(qs:StreamingQuery):
    time.sleep(10)
    while True:
        if not qs.status.get('isDataAvailable'):
            print("Stopping Query Stream")
            qs.stop()
            break
        else:
            time.sleep(1)


...
sparkSession.stop()

  • Actual behavior: For other types of streams for e.g. kafka, this stops, for eventhubs, this doesnt
  • Expected behavior: The job should end successfully
  • Spark version: Dataproc batches v 1.1.2
  • spark-eventhubs artifactId and version : com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant