Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Run join type coercion with correct schemas active #19625

Merged
merged 3 commits into from
Nov 4, 2024

Conversation

wence-
Copy link
Collaborator

@wence- wence- commented Nov 4, 2024

We previously ran type coercion for the left (respectively right) key expressions with the schema of the output join node active. This is incorrect since the keys may not even refer to column names in the output (if the expressions do not refer to columns). Instead, the left (right) key expressions must be coerced with the left (right) inputs active.

With this done, we now no longer need to run type coercion on the resulting Join node, and instead must just add it to the arena.

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Nov 4, 2024
@wence-
Copy link
Collaborator Author

wence- commented Nov 4, 2024

Note that this induces a behaviour change for some join expressions.

For example:

import polars as pl
left = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Float64)})
right = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Int64)})

q_col_keys = left.join(right, left_on=pl.col("a"), right_on=pl.col("a"))

q_col_exprs = left.join(right, left_on=pl.col("a")*2, right_on=pl.col("a")*2)

# Old behaviour
q_col_keys.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right

q_col_exprs.collect()

# shape: (3, 3)
# ┌─────┬─────────┬─────┐
# │ a   ┆ a_right ┆ b   │
# │ --- ┆ ---     ┆ --- │
# │ f64 ┆ i64     ┆ i8  │
# ╞═════╪═════════╪═════╡
# │ 1.0 ┆ 1       ┆ 1   │
# │ 2.0 ┆ 2       ┆ 2   │
# │ 3.0 ┆ 3       ┆ 3   │
# └─────┴─────────┴─────┘

# New behaviour

q_col_keys.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right

q_col_exprs.collect() # => ComputeError: datatypes of join keys don't match - `a`: f64 on left does not match `a`: i64 on right

So we're more consistent.

I think it should now also be possible to complain during schema validation, rather than compute, that the join keys have mismatching types but I couldn't spot the right place to do so.

@wence-
Copy link
Collaborator Author

wence- commented Nov 4, 2024

Not sure the best way to test this.

We previously ran type coercion for the left (respectively right) key
expressions with the schema of the output join node active. This is
incorrect since the keys may not even refer to column names in the
output (if the expressions do not refer to columns). Instead, the
left (right) key expressions must be coerced with the left (right)
inputs active.

With this done, we now no longer need to run type coercion on the
resulting Join node, and instead must just add it to the arena.

- Fixes pola-rs#19597
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.92%. Comparing base (e52a598) to head (d320a3a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #19625   +/-   ##
=======================================
  Coverage   79.91%   79.92%           
=======================================
  Files        1536     1536           
  Lines      211659   211676   +17     
  Branches     2445     2445           
=======================================
+ Hits       169156   169177   +21     
+ Misses      41948    41944    -4     
  Partials      555      555           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wence-
Copy link
Collaborator Author

wence- commented Nov 4, 2024

I think it should now also be possible to complain during schema validation, rather than compute, that the join keys have mismatching types but I couldn't spot the right place to do so.

Pushed something that does this.

Previously we would only fail when we got to the compute stage, now we
complain if datatypes don't match after type coercion.
@ritchie46
Copy link
Member

Not sure the best way to test this.

Maybe test on the error raised in the example you give in this PR?

@wence-
Copy link
Collaborator Author

wence- commented Nov 4, 2024

Not sure the best way to test this.

Maybe test on the error raised in the example you give in this PR?

Done.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank. Good observation that we need to resolve on the inputs. That's a tricky one.

@ritchie46 ritchie46 merged commit 45a9313 into pola-rs:main Nov 4, 2024
24 checks passed
@wence- wence- deleted the wence/fix/19597 branch November 4, 2024 12:36
tylerriccio33 pushed a commit to tylerriccio33/polars that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
2 participants