-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.5: Only traverse ancestors of current snapshot when building changelog scan #10252
base: main
Are you sure you want to change the base?
Conversation
13715b0
to
6ccd2d8
Compare
6ccd2d8
to
debc745
Compare
@flyrain @aokolnychyi please help review |
Gentle ping @flyrain @aokolnychyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. @manuzhang. It looks good to me overall. Left minor comments.
changelogRecords(null, rightAfterSnap2)); | ||
|
||
assertEquals( | ||
"Should have expected changed rows from snapshot 2 and 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not include change rows from snapshot 2. The result is correct, but the message is bit misleading. How about something like this?
Should have expected changed rows from snapshot 3 only since snapshot 2 is in a different branch.
ImmutableList.of( | ||
row(1, "a", "DELETE", 0, snap3.snapshotId()), | ||
row(-2, "a", "INSERT", 0, snap3.snapshotId())), | ||
changelogRecords(rightAfterSnap2, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more cases by rollbacking to snapshot 2? We want to test it doesn't pick up the latest snapshot 3 when it is not in the main branch.
if (current.timestampMillis() <= endTimestamp) { | ||
snapshotId = current.snapshotId(); | ||
} else { | ||
for (Snapshot ancestor : SnapshotUtil.currentAncestors(table)) { | ||
if (ancestor.timestampMillis() <= endTimestamp) { | ||
snapshotId = ancestor.snapshotId(); | ||
break; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotUtil.currentAncestors(table)
includes the current snapshot as well. We could simplify the logic a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we can move this method to the class SnapshotUtil
. It could be useful for other scans as well. For example, I'm not sure if time travel query like this has the similar bug. We can double check on that, but it's a blocker for this PR.
SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';
This fixes #10247