-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: move materialization into operator #3357
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
UnintializedWriter, | ||
ApplyMaterializedLogsErrorMetadataSegment(#[from] MetadataSegmentError), | ||
#[error("Uninitialized writer")] | ||
UninitializedWriter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelling fixes
c8fca26
to
500bf8a
Compare
99780f2
to
b25da8b
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
max_compaction_size: usize, | ||
max_partition_size: usize, | ||
// Populated during the compaction process | ||
cached_segments: Option<Vec<Segment>>, | ||
writers: OnceCell<CompactWriters>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of OnceCell
@@ -70,16 +76,16 @@ understand. We can always add more abstraction later if we need it. | |||
enum ExecutionState { | |||
Pending, | |||
Partition, | |||
Write, | |||
MaterializeAndWrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be two separate states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of partitioning, there may be simultaneous materialization and segment writing occurring
we could add a synchronization point between the two steps if that's what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is what I am asking, materialization seems lightweight enough to make this discrete but I am not sure if there are any valid reasons other than it feels cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood
I prefer to leave it as-is for now
after the third PR in this stack materialization, segment writing, committing, and flushing can all (theoretically) be running concurrently for different partitions
so if there's a good reason to have a sync point here (debugging? not sure how helpful it would be though) we should probably also add a sync point for all other steps
b25da8b
to
88c8462
Compare
88c8462
to
b170c08
Compare
Description of changes
Log materialization is now in its own operator. Having materialization in its own operator unlocks two main benefits:
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustTested locally with SciDocs as well.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?