Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: include in-progress row group when calculating in-memory buffer length #1638

Merged
merged 8 commits into from
Sep 17, 2023
14 changes: 13 additions & 1 deletion rust/src/writer/record_batch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ impl PartitionWriter {
/// Returns the current byte length of the in memory buffer.
/// This may be used by the caller to decide when to finalize the file write.
pub fn buffer_len(&self) -> usize {
self.buffer.len()
self.buffer.len() + self.arrow_writer.in_progress_size()
}
}

Expand Down Expand Up @@ -422,6 +422,18 @@ mod tests {
use arrow::json::ReaderBuilder;
use std::path::Path;

#[tokio::test]
async fn test_buffer_len_includes_unflushed_row_group() {
let batch = get_record_batch(None, false);
let partition_cols = vec![];
let table = create_initialized_table(&partition_cols).await;
let mut writer = RecordBatchWriter::for_table(&table).unwrap();

writer.write(batch).await.unwrap();

assert!(writer.buffer_len() > 0);
}

#[tokio::test]
async fn test_divide_record_batch_no_partition() {
let batch = get_record_batch(None, false);
Expand Down