New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Handle deletes and updates for count API #2185
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@@ -275,6 +275,22 @@ impl<'me, K: ArrowReadableKey<'me>, V: ArrowReadableValue<'me>> ArrowBlockfileRe | |||
} | |||
} | |||
|
|||
pub(crate) async fn key_exists(&'me self, prefix: &str, key: K) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer has() or contains()
rust/worker/src/blockstore/types.rs
Outdated
pub(crate) async fn key_exists(&'referred_data self, prefix: &str, key: K) -> bool { | ||
match self { | ||
BlockfileReader::ArrowBlockfileReader(reader) => reader.key_exists(prefix, key).await, | ||
BlockfileReader::MemoryBlockfileReader(reader) => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we implemented this, does it need to be todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! Missed updating here after implementing it
@@ -23,16 +25,21 @@ impl CountRecordsOperator { | |||
pub(crate) struct CountRecordsInput { | |||
record_segment_definition: Segment, | |||
blockfile_provider: BlockfileProvider, | |||
// Note: this vector needs to be in the same order as the log | |||
// for the counting logic to be correct. | |||
log_operation_and_id: Vec<(Operation, String)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to why we pull this datastructure out rather than just keep Chunk which encapsulates the same information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there's no particular reason. I was trying to be cautious of cloning the entire log record by selectively cloning only the id and operation but I noticed that it is behind Arc so will change
let logs = logs.logs(); | ||
self.log_record_count = logs.total_len(); | ||
// TODO: Add logic for merging logs with count from record segment. | ||
let mut operation_and_id: Vec<(Operation, String)> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we just forwarded the Chunk<> to align with other operators and also to minimize copying
} | ||
}; | ||
// Reconcile adds, updates and deletes. | ||
let mut present_id_set: HashSet<String> = HashSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on naming for readability here: would be nice if we made it distinct that this is present_in_record_segment or something as below we also have a "present set"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to more verbose names and also added some comemnts
c40807c
to
9bd59b6
Compare
rust/worker/src/blockstore/types.rs
Outdated
@@ -262,7 +264,7 @@ impl BlockfileFlusher { | |||
pub(crate) fn id(&self) -> uuid::Uuid { | |||
match self { | |||
// TODO: should memory blockfiles have ids? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this todo
|
||
#[cfg(test)] | ||
mod tests { | ||
use std::{collections::HashMap, str::FromStr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use statements should touch to trigger format/autosort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[IGNORE]
## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Count should handle uninit'ed segments - New functionality - None ## Test plan *How are these changes tested?* #2185 should add tests for the count() operator - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes None
Description of changes
Summarize the changes made by this PR.
Updates, deletes and duplicate adds are handled for the count API in this PR.
Test plan
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None