Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v0.10.2
What's Changed
New features
- feat: add restore command in python binding by @loleek in #1529
- feat: buffered reading of transaction logs by @eeroel in #1549
Bug fixes
- fix: correct whitespace in delta protocol reader minimum version error message by @polynomialherder in #1576
- fix: just make pyarrow 12 the max by @wjones127 in #1603
- fix: support partial statistics in JSON by @CurtHagenlocher in #1599
- perf: avoid holding GIL in DeltaFileSystemHandler by @wjones127 in #1615
- fix: change map nullable value to false by @cmackenzie1 in #1620
- fix: don't re-encode paths by @wjones127 in #1613
Other
- ci: don't run benchmark in debug mode by @wjones127 in #1566
- chore: update
datafusion
to28
and arrow to43
by @cmackenzie1 in #1571 - chore: move deps to
[workspace.dependencies]
by @cmackenzie1 in #1575 - fix: remove alpha classifier by @marcelotrevisani in #1578
- refactor: use pa.table.cast in delta_arrow_schema_from_pandas by @ion-elgreco in #1573
- feat: add metadata for operations::write::WriteBuilder by @abhimanyusinghgaur in #1584
- feat: add metadata for deletion vectors by @aersam in #1583
- refactor: clean up arrow schema defs by @polynomialherder in #1590
- fix: update python test by @wjones127 in #1608
- chore: update datafusion to 30, arrow to 45 by @scsmithr in #1606
- chore: bump python version to 0.10.2 by @wjones127 in #1616
- ci: extend azure timeout by @wjones127 in #1622
- ci: fix python release by @wjones127 in #1624
New Contributors
- @polynomialherder made their first contribution in #1576
- @marcelotrevisani made their first contribution in #1578
- @ion-elgreco made their first contribution in #1573
- @aersam made their first contribution in #1583
- @CurtHagenlocher made their first contribution in #1599
- @scsmithr made their first contribution in #1606
- @eeroel made their first contribution in #1549
Full Changelog: python-v0.10.1...python-v0.10.2
rust-v0.14.0
What's Changed
- fix: revert premature merge of an attempted fix for binary column statistics by @rtyler in #1544
- chore: disable incremental builds in CI for saving space by @rtyler in #1545
- chore: address some integration test bloat of disk usage for development by @rtyler in #1552
- chore: increment python version by @wjones127 in #1542
- docs: port docs to mkdocs by @MrPowers in #1548
- ci: install newer rust for macos python release by @wjones127 in #1565
- feat!: bulk delete for vacuum by @Blajda in #1556
- feat: make find_files public by @yjshen in #1560
- ci: don't run benchmark in debug mode by @wjones127 in #1566
- chore: update
datafusion
to28
and arrow to43
by @cmackenzie1 in #1571 - chore: move deps to
[workspace.dependencies]
by @cmackenzie1 in #1575 - fix: correct whitespace in delta protocol reader minimum version error message by @polynomialherder in #1576
- feat: add restore command in python binding by @loleek in #1529
New Contributors
- @yjshen made their first contribution in #1560
- @polynomialherder made their first contribution in #1576
Full Changelog: rust-v0.13.0...rust-v0.14.0
python-v0.10.1
What's Changed
New features
- feat: handle larger z-order jobs with streaming output and spilling by @wjones127 in #1461
- feat: implement restore operation by @loleek in #1502
- feat!: bulk delete for vacuum by @Blajda in #1556
Fixes
- fix(python): match Field signatures by @guilhem-dvr in #1463
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions by @cmackenzie1 in #1477 - fix: tiny typo in AggregatedStats by @haruband in #1516
- fix: handle nulls in file-level stats by @wjones127 in #1520
Other
- docs: show data catalog options in Python API reference by @omkar-foss in #1532
- chore: fix mypy failure by @wjones127 in #1500
- chore: increment python version by @wjones127 in #1542
- ci: install newer rust for macos python release by @wjones127 in #1565
New Contributors
- @guilhem-dvr made their first contribution in #1463
- @haruband made their first contribution in #1516
- @omkar-foss made their first contribution in #1532
Full Changelog: python-v0.10.0...python-v0.10.1
rust-v0.13.0
Implemented enhancements:
- Add nested struct supports #1518
- Support FixedLenByteArray UUID statistics as a logical scalar #1483
- Exposing create_add in the API #1458
- Update features table on README #1404
- docs(python): show data catalog options in Python API reference #1347
- Add optimization to only list log files starting at a certain name #1252
- Support configuring parquet compression #1235
- parallel processing in Optimize command #1171
Fixed bugs:
- get_add_actions() MAX is not showing complete value #1534
- Can't get stats's minValues in add actions #1515
- Pyarrow is_null filter not working as expected after loading using deltalake #1496
- Can't write to table that uses generated columns #1495
- Json error: Binary is not supported by JSON when writing checkpoint files #1493
- _last_checkpoint size field is incorrect #1468
- Error when Z Ordering a larger dataset #1459
- Timestamp parsing issue #1455
- File options are ignored when writing delta #1444
- Slack Invite Link No Longer Valid #1425
cleanup_metadata
doesn't remove.checkpoint.parquet
files #1420- The test of reading the data from the blob storage located in Azurite container failed #1415
- The test of reading the data from the bucket located in Minio container failed #1408
- Datafusion: unreachable code reached when parsing statistics with missing columns #1374
- vacuum is very slow on Cloudflare R2 #1366
Closed issues:
- Expose Compression Options or WriterProperties for writing to Delta #1469
- Support out-of-core Z-order using DataFusion #1460
- Expose Z-order in Python #1442
Merged pull requests:
- chore: fix the latest clippy warnings with the newer rustc's #1536 (rtyler)
- docs: show data catalog options in Python API reference #1532 (omkar-foss)
- fix: handle nulls in file-level stats #1520 (wjones127)
- feat: add nested struct supports #1519 (haruband)
- fix: tiny typo in AggregatedStats #1516 (haruband)
- refactor: unify with_predicate for delete ops #1512 (Blajda)
- chore: remove deprecated table functions #1511 (roeap)
- chore: update datafusion and related crates #1504 (roeap)
- feat: implement restore operation #1502 (loleek)
- chore: fix mypy failure #1500 (wjones127)
- fix: avoid writing statistics for binary columns to fix JSON error #1498 (ChewingGlass)
- feat(rust): expose WriterProperties method on RecordBatchWriter and DeltaWriter #1497 (theelderbeever)
- feat: add UUID statistics handling #1484 (atefsaw)
- feat: expose create_add to the public #1482 (atefsaw)
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions #1477 (cmackenzie1) - fix(python): match Field signatures #1463 (guilhem-dvr)
- feat: handle larger z-order jobs with streaming output and spilling #1461 (wjones127)
- chore: increment python version #1449 (wjones127)
- chore: upgrade to arrow 40 and datafusion 26 #1448 (rtyler)
- feat(python): expose z-order in Python #1443 (wjones127)
- ci: prune CI/CD pipelines #1433 (roeap)
- refactor: remove
LoadCheckpointError
andApplyLogError
#1432 (roeap) - feat: update writers to include compression method in file name #1431 (Blajda)
- refactor: move checkpoint and errors into separate module #1430 (roeap)
- feat: add z-order optimize #1429 (wjones127)
- fix: casting when data to be written does not match table schema #1427 (Blajda)
- docs: update README.adoc to fix expired Slack link #1426 (dennyglee)
- chore: remove no-longer-necessary build.rs for Rust bindings #1424 (rtyler)
- chore: remove the delta-checkpoint lambda which I have moved to a new repo #1423 (rtyler)
- refactor: rewrite redundant_async_block #1422 (cmackenzie1)
- fix: update cleanup regex to include
checkpoint.parquet
files #1421 (cmackenzie1) - docs: update features table in README #1414 (ognis1205)
- fix:
get_prune_stats
returns homogenousArrayRef
#1413 (cmackenzie1) - feat: explicit python exceptions #1409 (roeap)
- feat: implement update operation #1390 (Blajda)
- feat: allow concurrent file compaction #1383 (wjones127)
python-v0.10.0: Z-order, faster optimize and vacuum
What's Changed
- feat(python): expose z-order in Python by @wjones127 in #1443
- feat: add z-order optimize by @wjones127 in #1429
- feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown by @ognis1205 in #1349
- feat: add datafusion storage catalog by @roeap in #1381
- feat: allow concurrent file compaction by @wjones127 in #1383
- feat: vacuum with concurrent requests by @wjones127 in #1382
- feat: more efficient parquet writer and more statistics by @wjones127 in #1397
- feat: explicit python exceptions by @roeap in #1409
- feat: update writers to include compression method in file name by @Blajda in #1431
- fix: include stats for all columns (#1223) by @mrjoe7 in #1342
- fix: add py.typed marker by @SchutteJan in #1350
- fix: allow user defined config keys by @roeap in #1365
- fix: add conversion for string for
Field::TimestampMicros
(#1372) by @cmackenzie1 in #1373 - perf: improve record batch partitioning by @roeap in #1396
- chore: type-check friendlier exports by @roeap in #1407
New Contributors
- @SchutteJan made their first contribution in #1350
- @cmackenzie1 made their first contribution in #1373
- @rahulj51 made their first contribution in #1377
Full Changelog: python-v0.9.0...python-v0.10.0
rust-v0.12.0
Boy howdy there's some great looking performance improvements in this…
rust-v0.11.0
Implemented enhancements:
- Implement simple delete case #832
Merged pull requests:
- chore: update Rust package version #1346 (rtyler)
- fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
- feat: delete operation #1176 (Blajda)
- feat: add
wasbs
to known schemes #1345 (iajoiner) - test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
- feat: write command improvements #1267 (roeap)
- feat: added support for Databricks Unity Catalog #1331 (nohajc)
- fix: double url encode of partition key #1324 (mrjoe7)
python-v0.9.0
What's Changed
New features
- added support for Databricks Unity Catalog by @nohajc in #1331
- add optimize command in python binding by @loleek in #1313
- optimistic transaction protocol by @roeap in #632
- use new conflict checker in Python by @wjones127 in #1275
- Add Max Partitions Arg to Write by @ColeMurray in #1242
- Write support for additional Arrow datatypes by @chitralverma in #1044
- add package version by @wjones127 in #1243
- improve err msg on use of non-partitioned column by @marijncv in #1221
- update incremental after operations by @wjones127 in #1337
Fixes
- fix: double url encode of partition key by @mrjoe7 in #1324
- fix: documentation typo fix by @benrutter in #1332
- fix: allow special characters in storage prefix by @wjones127 in #1311
- fix: Fixed Documentation for
get_add_actions
function by @JHibbard in #1253 - fix: use native-tls for python deltalake releases by @wjones127 in #1244
- refactor: Simplify the Store Backend Configuration code by @mrjoe7 in #1265
New Contributors
- @ColeMurray made their first contribution in #1242
- @JHibbard made their first contribution in #1253
- @chitralverma made their first contribution in #1044
- @mrjoe7 made their first contribution in #1265
- @loleek made their first contribution in #1313
- @nohajc made their first contribution in #1331
Full Changelog: python-v0.8.1...python-v0.9.0
rust-v0.10.0
Implemented enhancements:
- Support Optimize on non-append-only tables #1125
Fixed bugs:
- DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
- Datafusion: SQL projection returns wrong column for partitioned data #1292
- Unable to query partitioned tables #1291
Merged pull requests:
- chore: add deprecation notices for commit logic on
DeltaTable
#1323 (roeap) - fix: handle local paths on windows #1322 (roeap)
- fix: scan partitioned tables with datafusion #1303 (roeap)
- fix: allow special characters in storage prefix #1311 (wjones127)
- feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
- Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
- Enable the json feature for the parquet crate #1300 (rtyler)
rust-v0.9.0
Implemented enhancements:
- hdfs support #300
- Add decimal primitive type to document #1280
- Improve error message when filtering on non-existant partition columns #1218
Fixed bugs:
- Datafusion table provider: issues with timestamp types #441
- Not matching column names when creating a RecordBatch from MapArray #1257
- All stores created using
DeltaObjectStore::new
have an identicalobject_store_url
#1188
Merged pull requests:
- Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
- chore: df / arrow changes after update #1288 (roeap)
- feat: read schema from parquet files in datafusion scans #1266 (roeap)
- HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
- Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
- Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
- Simplify the Store Backend Configuration code #1265 (mrjoe7)
- feat: optimistic transaction protocol #632 (roeap)
- Write support for additional Arrow datatypes #1044(chitralverma)
- Unique delta object store url #1212 (gruuya)
- improve err msg on use of non-partitioned column #1221 (marijncv)