Python Polars 1.17.0
🚀 Performance improvements
- Add fast paths for series.arg_sort and dataframe.sort (#19872)
- Much faster
Series
construction from subclasses of standard Python types (#20166) - Utilize the RangedUniqueKernel for Enum/Categorical (#20150)
- Reduce memory copy when scanning from Python objects (#20142)
- Construct
Series
for bytes/binary data 10x faster when dtype not explicitly set (#20157) - Don't instantiate validity mask when unneeded in Parquet (#20149)
✨ Enhancements
- Retry with reloaded credentials on cloud error (#20185)
- Support reading Enum dtype from csv (#20188)
- Improve dtype inference and load for
DataFrame
cols constructed from Python Enum values (#20180) - Allow sorting of lists and arrays (#20169)
- Add
maintain_order
parameter to joins (#20026) - Allow for
to_datetime
/strftime
to automatically parse dates with single-digit hour/minute/second (#20144) - Issue warning when using
to_struct()
without a list of field names (#20158) - Experimental cloud write support (#20129)
- Add lazy support for
pl.select
(#20091) - Enable view arrow export in
write_delta
(#20092)
🐞 Bug fixes
- Don't trigger length check in array construction (#20205)
- Allow row encoding for 32-bit architectures (e.g. WASM) (#20186)
- Properly project unordered column in parquet prefiltered (#20189)
- Csv stop simd cache if eol char is hit (#20199)
- Estimated size for object (#20191)
- Respect parallel argument in parquet (#20187)
- Only validate UTF-8 for selected items when all below len 128 (#20183)
- Serialize categories of Enum in arrow metadata (#20181)
- Don't use RLE encoding for Parquet Boolean (#20172)
- Invalid
bitwise_xor
for ScalarColumn (#20140) - Series construct with large nested
u64
(#20167) - Add temporal feature gate in
is_elementwise_top_level
(#20177) - Column name mismatch or not found in Parquet scan with filter (#20178)
- Raise if apply returns different types (#20168)
- Deal with masked out list elements (#20161)
- Fix index out of bounds in uniform_hist_count (#20133)
- Implement
arg_sort
for Null series (#20135) - Handle slice pushdown in PythonUDF GroupBy (#20132)
- Check shape for
*_horizontal
functions (#20130) - Properly coerce types in lists (#20126)
- Incorrect aggregation of empty groups after slice (#20127)
- DataFrame
.get_column
afterdrop_in_place
(#20120) - Subtraction with underflow on empty FixedSizeBinaryArray (#20109)
- Materialize smallest dyn ints to use feature gate for i8/i16 (#20108)
- Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series (#20077)
- Only slice after sort when slice is smaller than frame length (#20084)
- Preserve Series name in __rpow__ operation (#20072)
- Allow nested
is_in()
inwhen()/then()
for full-streaming (#20052)
📖 Documentation
- Add more Rust examples to User Guide (#20194)
- Expand plotting docs (#19719)
- Fix Rust examples in user guide (#20075)
- Update
by
param description for rolling_*_by functions (#19715) - Correct supported compression formats (#20085)
- Specify strictness in cast (#20067)
📦 Build system
- Upgrade
sqlparser-rs
from version0.49
to0.52
(#20110) - Bump
memmap2
to version0.9
(#20105) - Bump
object_store
to version0.11
(#20102) - Bump
fs4
to version0.12
(#20101) - Bump
thiserror
to version2
(#20097) - Bump
atoi_simd
to version0.16
(#20098) - Bump
chrono-tz
to0.10
(#20094) - Update Rust dependency
ndarray
to0.16
(#20093) - Bump Rust toolchain to
nightly-2024-11-28
(#20064)
🛠️ Other improvements
- Deprecate ddof parameter for correlation coefficient (#20197)
- Move Bitwise aggregations to FunctionExpr (#20193)
- Add ragged lines test (#20182)
- Set delta version check higher (#20153)
- Fix typo in assertion in datatype copy test (#20121)
- Move horizontal methods to polars-ops (#20134)
- Remove useless SeriesTrait::get implementations (#20136)
- Add a bunch more automated row encoding sortedness tests (#20056)
Thank you to all our contributors for making this release possible!
@DzenanJupic, @MarcoGorelli, @YichiZhang0613, @alexander-beedie, @coastalwhite, @dependabot, @dependabot[bot], @flowlight0, @henryharbeck, @iharthi, @ion-elgreco, @jqnatividad, @lukapeschke, @lukemanley, @mcrumiller, @nameexhaustion, @ptiza, @ritchie46, @siddharth-vi, @stijnherfst, @stinodego and @wsyxbcl