Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v0.20.0
New features
- feat(python, rust):
add feature
operation by @ion-elgreco in #2712 - feat: add support for
pyarrow.ExtensionType
by @fecet in #2885 - feat: improve AWS credential loading between S3 and DynamoDb code paths by @rtyler in #2887
Bug Fixes
- fix: pin the build-dependencies for Python to a slightly older vendored openssl by @rtyler in #2856
- fix: escaped columns in dataskippingstatscolumns by @ion-elgreco in #2855
- fix: set put mode to overwrite in mount backend by @ion-elgreco in #2861
- fix(rust): scan schema fix for predicate by @sherlockbeard in #2869
- fix(python, rust): use require files by @ion-elgreco in #2809
- fix: re-enable optional old casting behavior in merge by @ion-elgreco in #2853
- fix: conditionally disable enable_io non-unix based systems by @hntd187 in #2884
- fix: stats is optional in add action by @jkylling in #2841
- fix: prepare the next 🦀 release with fixed version ranges by @rtyler in #2875
- fix: pin broken dependencies and changes in 0.19.1 by @rtyler in #2878
Other Changes
- chore: pin the Rust baseline version to 1.80 by @rtyler in #2842
- chore: attempt to ignore all dependabot checks for arrow and datafusion by @rtyler in #2870
- docs: fix typo in delta-lake-dagster by @jessy1092 in #2883
- chore: exclude parquet from dependabot as well by @rtyler in #2874
- chore: rearrange github actions a bit by @rtyler in #2868
- chore: cleanup codecov defaults by @rtyler in #2876
- chore(deps): update sqlparser requirement from 0.50 to 0.51 by @dependabot in #2881
- chore: set max_retries in CommitProperties by @helanto in #2826
New Contributors
- @helanto made their first contribution in #2826
- @jessy1092 made their first contribution in #2883
- @fecet made their first contribution in #2885
Full Changelog: python-v0.19.2...python-v0.20.0
python-v0.19.2: objectstore conditional put
New features
- perf: conditional put for default log store (e.g. azure, gcs, minio, cloudflare) by @ion-elgreco in #2813
- feat: make
Add::get_json_stats
public by @gruuya in #2822 - refactor(python): add pymergebuilder by @ion-elgreco in #2823
- feat: public method to get partitions for DeltaTable (#2671) by @omkar-foss in #2816
- feat(rust): add operationMetrics to WRITE by @gavinmead in #2838
Bug Fixes
- fix: enable feature flags to which deltalake-core build tokio with enable_io by @rtyler in #2803
- fix(rust): set token provider explicitly by @ion-elgreco in #2817
- fix(python, rust): allow
in
pushdowns in early_filter by @ion-elgreco in #2807 - fix: use table config target file size, expose target_file_size in python by @ion-elgreco in #2811
Other Changes
- chore(python): raise not implemented in from_data_catalog by @ion-elgreco in #2799
- docs: added WriterProperties documentation by @sherlockbeard in #2804
- chore(python): remove deprecated or duplicated functions by @ion-elgreco in #2801
- test(python): fix optimize call in benchmark by @ion-elgreco in #2812
- docs: fix documentation about max_spill_size by @junhl in #2835
- refactor(python): post_commit_hook_properties derive by @ion-elgreco in #2824
- docs: fix docstring of set_table_properties by @astrojuanlu in #2820
- chore: enable codecov reporting by @rtyler in #2836
- chore(aws): use backon to replace backoff by @Xuanwo in #2840
- chore: update python by @ion-elgreco in #2845
- docs: concurrent writes permission missing by @poguez in #2846
New Contributors
- @junhl made their first contribution in #2835
- @Xuanwo made their first contribution in #2840
- @gavinmead made their first contribution in #2838
- @poguez made their first contribution in #2846
Full Changelog: python-v0.19.1...python-v0.19.2
python-v0.19.1: separate IO runtime
New features
- feat: configurable IO runtime by @ion-elgreco in #2789
- feat(python, rust): added statistics_truncate_length in WriterProperties by @sherlockbeard in #2784
- feat(python, rust): add ColumnProperties And rework in python WriterProperties by @sherlockbeard in #2793
Bug Fixes
- fix: pin maturin verison by @ion-elgreco in #2778
- fix(rust):
max_spill_size
default value by @mrjsj in #2795 - fix: trim trailing slash in url storage options (#2656) by @omkar-foss in #2775
Other Changes
- chore: update the changelog with the 0.19.0 release by @rtyler in #2774
- style: more consistent imports by @roeap in #2786
- chore: remove some
file_actions
call sites by @roeap in #2787 - chore(deps): update sqlparser requirement from 0.49 to 0.50 by @dependabot in #2792
New Contributors
Full Changelog: python-v0.19.0...python-v0.19.1
python-v0.19.0: complete CDF support, add column operation, faster MERGE
Breaking changes!
Default writer engine has changed to rust. Replace your partition_filters with a predicate (sql) instead. PyArrow engine is deprecated now, and will be removed in v1.0.
Highlights
- CDF support in write_deltalake, delete, and merge operation
- Expired logs cleanup during post-commit. Can be disabled with
delta.enableExpiredLogCleanup = false
- Improved MERGE performance by using predicate non-partition columns min/max for prefiltering
ADD column
operation- Speed up log parsing
Performance improvements
- perf: apply projection when reading checkpoint parquet by @alexwilcoxson-rel in #2717
- perf: grab file size in rust by @ion-elgreco in #2734
- feat: improve merge performance by using predicate non-partition columns min/max for prefiltering by @JonasDev1 in #2513
- perf: early stop if all values in arr are null by @ion-elgreco in #2764
New features
- feat(python, rust): cdc write-support for
delete
operation by @ion-elgreco in #2721 - feat(python, rust): cdc write-support for
overwrite
andreplacewhere
writes by @ion-elgreco in #2722 - feat: introduce CDC generation for merge operations by @rtyler in #2747
- feat: use logical plan in delete, delta planner refactoring by @ion-elgreco in #2725
- feat: use logical plan in update, refactor/simplify CDCTracker by @ion-elgreco in #2727
- feat(python, rust): arrow large/view types passthrough, rust default engine by @ion-elgreco in #2738
- feat(python, rust): cleanup expired logs post-commit hook by @ion-elgreco in #2459
- feat(python, rust):
add column
operation by @ion-elgreco in #2562 - feat(python): handle PyCapsule interface objects in write_deltalake by @kylebarron in #2534
- feat(rust): fix size_in_bytes in last_checkpoint_ to i64 by @sherlockbeard in #2649
- feat(rust,python): cast each parquet file to delta schema by @HawaiianSpork in #2615
- feat: support userMetadata in CommitInfo by @jkylling in #2670
- feat(python, rust): add projection in CDF reads by @ion-elgreco in #2704
- feat(python): add DeltaTable.is_deltatable static method (#2662) by @omkar-foss in #2715
- feat: improved test fixtures by @roeap in #2749
- feat: fail fast on forked process by @Tom-Newton in #2765
- feat: restore the TryFrom for DeltaTablePartition by @rtyler in #2767
- feat: more economic data skipping with datafusion by @roeap in #2772
Bug Fixes
- fix(rust): inconsistent order of partitioning columns (#2494) by @aditanase in #2614
- fix(rust,python): checkpoint with column nullable false by @sherlockbeard in #2680
- fix: update delta kernel version by @jeppe742 in #2685
- fix(python): empty dataset fix for "pyarrow" engine by @sherlockbeard in #2689
- fix: ensure DataFusion SessionState Parquet options are applied to DeltaScan by @alexwilcoxson-rel in #2702
- fix(python, rust): use url encoder when encoding partition values by @ion-elgreco in #2705
- fix(python, rust): use input schema to get correct schema in cdf reads by @ion-elgreco in #2723
- fix: change arrow map root name to follow with parquet root name by @sclmn in #2538
- fix: schema adapter doesn't map partial batches correctly by @alexwilcoxson-rel in #2735
- fix: optimize Spark written tables by @rtyler in #1650
- fix(python, rust): cdc in writer not creating inserts by @ion-elgreco in #2751
- fix(python, rust): don't flatten fields during cdf read by @ion-elgreco in #2763
- fix: column parsing to include nested columns and enclosing char by @gtrawinski in #2737
Other Changes
- chore: missed one macos runner reference in actions by @rtyler in #2645
- chore: add a reproduction case for merge failures with struct by @rtyler in #2644
- ci: update CODEOWNERS by @hntd187 in #2650
- chore: increase subcrate versions by @rtyler in #2648
- docs: fix bullets on hdfs docs by @Kimahriman in #2653
- docs: improve navigation fixes by @avriiil in #2660
- docs: add integration docs for s3 backend by @avriiil in #2658
- chore: bump ruff to 0.5.2 by @fpgmaas in #2673
- chore: enable
RUF
ruleset forruff
by @fpgmaas in #2677 - chore: pin
ruff
andmypy
versions in thelint
stage in the CI pipeline by @fpgmaas in #2679 - chore: update README.md by @veronewra in #2684
- chore: create separate action to setup python and rust in the cicd pipeline by @fpgmaas in #2687
- chore: add test coverage command to
Makefile
by @fpgmaas in #2688 - chore: improve contributing.md by @fpgmaas in #2672
- chore: remove stale code for conditional import of
Literal
by @fpgmaas in #2676 - chore: remove references to black from the project by @fpgmaas in #2674
- chore: refactor
write_deltalake
inwriter.py
by @fpgmaas in #2695 - chore: upgrade to datafusion 40 by @rtyler in #2661
- chore: prepare python release 0.18.3 by @ion-elgreco in #2707
- chore: enabling actions for merge groups by @rtyler in #2718
- chore(deps): update sqlparser requirement from 0.47 to 0.49 by @dependabot in #2714
- chore: try an alternative docke compose invocation syntax by @rtyler in #2724
- chore(deps): update which requirement from 4 to 6 by @dependabot in #2730
- chore: update changelog and versions for next release by @rtyler in #2740
- chore: add to code_owner crates by @ion-elgreco in #2741
- chore: update delta_kernel to 0.3.0 by @alexwilcoxson-rel in #2742
- docs: fix broken link in docs by @astrojuanlu in #2746
- chore: upgrade to datafusion 41 by @rtyler in #2761
- chore: prepare the next notable release of 0.19.0 by @rtyler in #2768
- chore: fix a bunch of clippy lints and re-enable tests by @rtyler in #2773
New Contributors
- @aditanase made their first contribution in #2614
- @fpgmaas made their first contribution in #2673
- @kylebarron made their first contribution in #2534
- @veronewra made their first contribution in #2684
- @jeppe742 made their first contribution in #2685
- @sclmn made their first contribution in #2538
- @astrojuanlu made their first contribution in #2746
- @gtrawinski made their first contribution in #2737
Full Changelog: python-v0.18.2...python-v0.19.0
python-v0.18.2: HDFS support
New features
- feat(#2597): allow pyarrow.dataset.Expression in filters kwarg by @giacomorebecchi in #2600
- feat(rust, python): add HDFS support via hdfs-native package by @Kimahriman in #2612
- feat: report DataFusion metrics for DeltaScan by @alexwilcoxson-rel in #2617
Bug Fixes
- fix: enable parquet pushdown for DeltaScan via TableProvider impl for DeltaTable (rebase) by @rtyler in #2637
- fix(rust, python): fix writing empty structs when creating checkpoint by @sherlockbeard in #2627
- fix(python): fixed large_dtype to schema convert by @sherlockbeard in #2635
- fix(rust, python): fix merge schema with overwrite by @sherlockbeard in #2623
- fix(python): constrain multipart upload size to fixed length by @abhiaagarwal in #2606
- fix: update changelog by @rtyler in #2599
Other Changes
- chore: migrate to pyo3 Bounds API by @abhiaagarwal in #2596
- chore(deps): update dashmap requirement from 5 to 6 by @dependabot in #2641
- chore: remove macos builders from pull request flow by @rtyler in #2638
- docs: add Daft writer by @avriiil in #2594
- chore: fix documentation generation with a pin of griffe by @rtyler in #2636
- chore: bump python 0.18.2 by @ion-elgreco in #2621
- chore: implement regression test for push down panic by @rtyler in #2604
- docs: fix typo by @avriiil in #2603
- test: reintroduce azurite SAS integration tests by @giacomorebecchi in #2598
New Contributors
- @giacomorebecchi made their first contribution in #2598
- @Kimahriman made their first contribution in #2612
- @sherlockbeard made their first contribution in #2623
Full Changelog: python-v0.18.1...python-v0.18.2
python-v0.18.1
New features
- feat: add custom dynamodb endpoint configuration by @hnaoto in #2575
- chore: bump to datafusion 39, arrow 52, pyo3 0.21 by @abhiaagarwal in #2581
Bug Fixes
- chore: bump macOS runners, maybe resolve import error by @ion-elgreco in #2588
Other Changes
- docs: improve S3 access docs by @avriiil in #2589
- chore: expose
files_by_partition
to public api by @edmondop in #2533
New Contributors
- @abhiaagarwal made their first contribution in #2581
- @edmondop made their first contribution in #2533
Full Changelog: python-v0.18.0...python-v0.18.1
python-v0.18.0: CDC for update operation, added `set table properties` operation
New features
- feat: adopt kernel schema types by @roeap in #2495
- feat: add stats to convert-to-delta operation by @gruuya in #2491
- feat(python, rust): add
set table properties
operation by @ion-elgreco in #2264 - feat: implement transaction identifiers - continued by @roeap in #2539
- feat: introduce CDC write-side support for the Update operations by @rtyler in #2486
Bug Fixes
- fix(rust, python): fixed differences in storage options between log and object stores by @mightyshazam in #2500
- fix: enable field_with_name to support nested fields with '.' delimiter by @alexwilcoxson-rel in #2519
- fix(python): release GIL on most operations by @adriangb in #2512
- fix: clippy warnings by @imor in #2548
- fix: remove deprecated overwrite_schema configuration which has incorrect behavior by @rtyler in #2554
- fix: update deltalake crate examples for crate layout and TimestampNtz by @jhoekx in #2559
- fix: consistently use raise_if_key_not_exists in CreateBuilder by @vegarsti in #2569
- fix: cast support fields nested in lists and maps by @HawaiianSpork in #2541
Other Changes
- docs: fix typo by @avriiil in #2508
- chore: tidying up builds without datafusion feature and clippy by @rtyler in #2516
- chore: fixing some clips by @rtyler in #2521
- fix: msrv in workspace by @roeap in #2524
- feat(rust): make PartitionWriter public by @adriangb in #2525
- docs: improve daft integration docs by @avriiil in #2496
- chore: bump python 0.17.5 by @ion-elgreco in #2531
- chore(deps): update itertools requirement from 0.12 to 0.13 by @dependabot in #2526
- docs: dask write syntax fix by @avriiil in #2543
- docs: pull delta from conda not pip by @avriiil in #2535
- docs: clarify locking mechanism requirement for S3 by @inigohidalgo in #2558
- chore(deps): update sqlparser requirement from 0.46 to 0.47 by @dependabot in #2563
- docs: dt.delete add context + api docs link by @avriiil in #2560
New Contributors
- @imor made their first contribution in #2548
- @inigohidalgo made their first contribution in #2558
- @vegarsti made their first contribution in #2565
- @HawaiianSpork made their first contribution in #2541
Full Changelog: python-v0.17.4...python-v0.18.0
python-v0.17.4: stats collection according config
New features
- feat(python): add parameter to DeltaTable.to_pyarrow_dataset() by @adriangb in #2465
- feat(python, rust): respect column stats collection configurations by @ion-elgreco in #2428
Bug Fixes
- fix(rust): implement abort commit for S3DynamoDBLogStore by @PeterKeDer in #2452
- fix(python, rust): use new schema for stats parsing instead of old by @ion-elgreco in #2480
- fix: check to see if the file exists before attempting to rename by @rtyler in #2482
- fix(rust): unable to read delta table when table contains both null and non-null add stats by @yjshen in #2476
- fix(python, rust): region lookup wasn't working correctly for dynamo by @mightyshazam in #2488
- fix: return unsupported error for merging schemas in the presence of partition columns by @emcake in #2469
- fix(python): reuse state in
to_pyarrow_dataset
by @ion-elgreco in #2485
Other Changes
- chore(deps): update sqlparser requirement from 0.44 to 0.46 by @dependabot in #2483
- test: add test for concurrent checkpoint during table load by @alexwilcoxson-rel in #2151
Full Changelog: python-v0.17.3...python-v0.17.4
python-v0.17.3: CDF read support
New features
- feat(rust): advance state in post commit by @ion-elgreco in #2396
- feat: cdf reader for delta tables by @hntd187 in #2048
- feat(python, rust): add OBJECT_STORE_CONCURRENCY_LIMIT setting for ObjectStoreFactory by @zZKato in #2458
Bug Fixes
Other changes
- chore(rust): bump arrow v51 and datafusion v37.1 by @lasantosr in #2395
New Contributors
Full Changelog: python-v0.17.2...python-v0.17.3
rust-v0.17.3
rust-v0.17.3 (2024-05-01)
Implemented enhancements:
- Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
- How to get a DataFrame in Rust? #2404
- Allow checkpoint creation when partion column is "timestampNtz " #2381
- is there a way to make writing timestamp_ntz optional #2339
- Update arrow dependency #2328
- Release GIL in deltalake.write_deltalake #2234
- Unable to retrieve custom metadata from tables in rust #2153
- Refactor commit interface to be a Builder #2131
Fixed bugs:
- Handle rate limiting during write contention #2451
- regression : delta.logRetentionDuration don't seems to be respected #2447
- Issue writing to mounted storage in AKS using delta-rs library #2445
- TableMerger - when_matched_delete() fails when Column names contain special characters #2438
- Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
- Merge on predicate throw error on date colum: Unable to convert expression to string #2420
- Writing Tables with Append mode errors if the schema metadata is different #2419
- Logstore issues on AWS Lambda #2410
- Datafusion timestamp type doesn't respect delta lake schema #2408
- Compacting produces smaller row groups than expected #2386
- ValueError: Partition value cannot be parsed from string. #2380
- Very slow s3 connection after 0.16.1 #2377
- Merge update+insert truncates a delta table if the table is big enough #2362
- Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
- Create empty table failed on rust engine #2354
- Getting error message when running in lambda: message: "Too many open files" #2353
- Temporary files filling up _delta_log folder - increasing table load time #2351
- compact fails with merged schemas #2347
- Cannot merge into table partitioned by date type column on 0.16.3 #2344
- Merge breaks using logical datatype decimal128 #2343
- Decimal types are not checked against max precision/scale at table creation #2331
- Merge update+insert truncates a delta table #2320
- Extract
add.stats_parsed
with wrong type #2312 - Process fails without error message when executing merge #2310
- delta_rs don't seems to respect the row group size #2309
- Auth error when running inside VS Code #2306
- Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
- Schema evolution not coercing with Large arrow types #2298
- Panic in
deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}}
#2290 - Checkpoint does not preserve reader and writer features for the table protocol. #2288
- Z-Order with larger dataset resulting in memory error #2284
- Successful writes return error when using concurrent writers #2279
- Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
- Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
- DeltaTable is not resilient to corrupted checkpoint state #2258
- Inconsistent units of time #2256
- Partition column comparison is an assertion rather than if block with raise exception #2242
- Unable to merge column names starting from numbers #2230
- Merging to a table with multiple distinct partitions in parallel fails #2227
- cleanup_metadata not respecting custom
logRetentionDuration
#2180 - Merge predicate fails with a field with a space #2167
- When_matched_update causes records to be lost with explicit predicate #2158
- Merge execution time grows exponetially with the number of column #2107
- _internal.DeltaError when merging #2084