Need help understanding behaviour of offset. #657

dhanush1708 · 2022-10-20T13:33:57Z

My use case is, I have an eventhub with data retention period of 1 day. Lets say I created a consumer group "consgroup".
Lets say I started the structured stream of reading data from eventhub with start offset value -1.
Will spark start reading the data from 1 day ago ?

Now after sometime I have stopped my cell execution and ran it again, with offset value -1. Will it continue from where it left off or will it read the 1 day old data again? What should I do if I want it to continue reading from where it left off?

I wanted to start reading 1 day old data, so I used offset value as -1. The pipeline did read 1 day old data, but the pipeline failed after some time, so I want to run it such that it starts reading from where it stopped when pipeline failed. When I re-run the cell, should I put offset as -1 or should I use the default value (end of stream)? Does consumer group play any role in storing till where the data has been read?
What does "start of the stream" and "end of the stream" mean in the docs? Can you explain a bit about these terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help understanding behaviour of offset. #657

Need help understanding behaviour of offset. #657

dhanush1708 commented Oct 20, 2022

Need help understanding behaviour of offset. #657

Need help understanding behaviour of offset. #657

Comments

dhanush1708 commented Oct 20, 2022