Skip to content

Commit

Permalink
fix: ensure the checkpoint decoder is regularly flushed
Browse files Browse the repository at this point in the history
For checkpoint buffers that cannot fit into the batch size the
checkpoint will be written with an insufficient number of bytes.

Unfortunately our tests didn't catch this and it only manifested on
tables with very large transaction logs

Signed-off-by: R. Tyler Croy <[email protected]>
  • Loading branch information
rtyler committed Oct 24, 2024
1 parent 8ba8e08 commit 61a5648
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions crates/core/src/protocol/checkpoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -356,11 +356,12 @@ fn parquet_bytes_from_state(
for j in jsons {
let buf = serde_json::to_string(&j?).unwrap();
let _ = decoder.decode(buf.as_bytes())?;

while let Some(batch) = decoder.flush()? {
writer.write(&batch)?;
}
total_actions += 1;
}
while let Some(batch) = decoder.flush()? {
writer.write(&batch)?;
}

let _ = writer.close()?;
debug!("Finished writing checkpoint parquet buffer.");
Expand Down

0 comments on commit 61a5648

Please sign in to comment.