Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help understanding behaviour of offset. #657

Open
dhanush1708 opened this issue Oct 20, 2022 · 0 comments
Open

Need help understanding behaviour of offset. #657

dhanush1708 opened this issue Oct 20, 2022 · 0 comments

Comments

@dhanush1708
Copy link

My use case is, I have an eventhub with data retention period of 1 day. Lets say I created a consumer group "consgroup".
Lets say I started the structured stream of reading data from eventhub with start offset value -1.
Will spark start reading the data from 1 day ago ?

Now after sometime I have stopped my cell execution and ran it again, with offset value -1. Will it continue from where it left off or will it read the 1 day old data again? What should I do if I want it to continue reading from where it left off?

I wanted to start reading 1 day old data, so I used offset value as -1. The pipeline did read 1 day old data, but the pipeline failed after some time, so I want to run it such that it starts reading from where it stopped when pipeline failed. When I re-run the cell, should I put offset as -1 or should I use the default value (end of stream)? Does consumer group play any role in storing till where the data has been read?
What does "start of the stream" and "end of the stream" mean in the docs? Can you explain a bit about these terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant