-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): Run join type coercion with correct schemas active #19625
Conversation
Note that this induces a behaviour change for some join expressions. For example: import polars as pl
left = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Float64)})
right = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Int64)})
q_col_keys = left.join(right, left_on=pl.col("a"), right_on=pl.col("a"))
q_col_exprs = left.join(right, left_on=pl.col("a")*2, right_on=pl.col("a")*2)
# Old behaviour
q_col_keys.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right
q_col_exprs.collect()
# shape: (3, 3)
# ┌─────┬─────────┬─────┐
# │ a ┆ a_right ┆ b │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ i64 ┆ i8 │
# ╞═════╪═════════╪═════╡
# │ 1.0 ┆ 1 ┆ 1 │
# │ 2.0 ┆ 2 ┆ 2 │
# │ 3.0 ┆ 3 ┆ 3 │
# └─────┴─────────┴─────┘
# New behaviour
q_col_keys.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right
q_col_exprs.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right So we're more consistent. I think it should now also be possible to complain during schema validation, rather than compute, that the join keys have mismatching types but I couldn't spot the right place to do so. |
3091599
to
c11334b
Compare
Not sure the best way to test this. |
We previously ran type coercion for the left (respectively right) key expressions with the schema of the output join node active. This is incorrect since the keys may not even refer to column names in the output (if the expressions do not refer to columns). Instead, the left (right) key expressions must be coerced with the left (right) inputs active. With this done, we now no longer need to run type coercion on the resulting Join node, and instead must just add it to the arena. - Fixes pola-rs#19597
c11334b
to
b169445
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #19625 +/- ##
=======================================
Coverage 79.91% 79.92%
=======================================
Files 1536 1536
Lines 211659 211676 +17
Branches 2445 2445
=======================================
+ Hits 169156 169177 +21
+ Misses 41948 41944 -4
Partials 555 555 ☔ View full report in Codecov by Sentry. |
Pushed something that does this. |
74d018c
to
31e06ca
Compare
Previously we would only fail when we got to the compute stage, now we complain if datatypes don't match after type coercion.
31e06ca
to
dc6982c
Compare
Maybe test on the error raised in the example you give in this PR? |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank. Good observation that we need to resolve on the inputs. That's a tricky one.
We previously ran type coercion for the left (respectively right) key expressions with the schema of the output join node active. This is incorrect since the keys may not even refer to column names in the output (if the expressions do not refer to columns). Instead, the left (right) key expressions must be coerced with the left (right) inputs active.
With this done, we now no longer need to run type coercion on the resulting Join node, and instead must just add it to the arena.