-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown #1349
feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown #1349
Conversation
Signed-off-by: Shingo OKAWA <[email protected]>
Signed-off-by: Shingo OKAWA <[email protected]>
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
https://github.com/delta-io/delta-rs/actions/runs/4926020116/jobs/8800992032?pr=1349#step:8:18
|
https://github.com/delta-io/delta-rs/actions/runs/4943095151/jobs/8837254340?pr=1349#step:5:545 I think the above error is not relevant to this PR. Code example: import deltalake
dt = deltalake.DeltaTable("../../rust/tests/data/COVID-19_NYT")
# "date > 2021-02-20" OR "state in ["Alabama", "Wyoming"]"
df = dt.to_pandas(filters=[[("date", ">", "2021-02-20")], [("state", "in", ["Alabama", "Wyoming"])]])
# "date > 2021-02-20" AND "state in ["Alabama", "Wyoming"]"
df = dt.to_pandas(filters=[("date", ">", "2021-02-20"), ("state", "in", ["Alabama", "Wyoming"])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like you to add a test at least for to_pyarrow_table()
exercising the new argument. Ideally it should include a case for a conjunctionm one with a disjunction, and one single predicate.
Sorry I had forgotten to mention you, but I believe I have now resolved all the issues. Would it be possible for you to review the PR again when you have a moment? I would really appreciate it. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, those tests look great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, those tests look great!
Description
This pull request adds support for filtering to the
to_pandas
method ofDeltaTable
. The new filters argument allowsusers to specify filtering criteria in a format that is compatible with
pyarrow.compute.Expression
.filters
argument toDeltaTable.to_pandas
._filters_to_expression
function totable
module, which is based on this implementation, but with improved type consistency.Based on the existing conventional unit tests, I did not add any additional test cases for this feature. Instead, I tested the feature on my local development environment.Related Issue(s)
Documentation
The DNF filter format.