AsyncArrowWriter
API to get the total size of a written parquet file
#6530
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We are writing Parquet files with
AsyncArrowWriter
and then reading them back in with DataFusions'sPartitionedFile
which requires the file length (as part ofObjectMeta
)It is not clear how we could get the actual length from the
AsyncArrowWriter
Before you make th call to
AsyncArrowWriter::close
the "written bytes" will not reflect the file footer sizeHowever, looking at the API,
AsyncArrowWriter::close
doesn't seem to return the overall file size (it only returns theFileMetadata
which does not have the overall file sizeDescribe the solution you'd like
I would like to be able to get the total file size from the writer
Here is a test that shows the usecas (in https://github.com/apache/arrow-rs/blob/7f2d9ac14b1b5b846feb130f5cbdfd64e6616cb9/parquet/src/arrow/async_writer/mod.rs#L389-L388):
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: