New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source): let shared SourceExecutor start from latest instead of specified offset #16626
Conversation
Signed-off-by: xxchan <xxchan22f@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Have you verified when the Offset::End
is resolved? In other words, if you create a Reader
and then immediately pause it (without polling once), will the reader eventually start from the latest offset resolved during creation or when the stream is resumed and first polled?
@@ -24,6 +24,12 @@ pub struct KafkaSplit { | |||
pub(crate) partition: i32, | |||
pub(crate) start_offset: Option<i64>, | |||
pub(crate) stop_offset: Option<i64>, | |||
#[serde(skip)] | |||
/// Used by shared source to hackily seek to the latest offset without fetching start offset first. | |||
/// XXX: But why do we fetch low watermark for latest start offset..? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that theoretically we can also follow this approach for the Kafka sources specified scan.startup.mode='latest'
by users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think so. @tabVersion @shanicky so why do we fetch low watermark instead of passing Offset::End?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that pulsar uses PulsarEnumeratorOffset
in PulsarSplit
, so I think it's perfectly ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, how about writing a new post about the shared source arch.
/// XXX: But why do we fetch low watermark for latest start offset..? | ||
/// | ||
/// When this is `true`, `start_offset` will be ignored. | ||
pub(crate) hack_seek_to_latest: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest changing another name like force_start_latest
. This name sounds like doing some magic 🪄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is hacky and magical.. If possibile, I want to only use start_offset
to do this. In other words, also use Offset::End
for scan.startup.mode='latest'
It's when |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
#16576 (comment)
mentioned by Eric, this is also his original idea in his RFC
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.