Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown #1349

Merged
merged 4 commits into from
May 12, 2023

Conversation

ognis1205
Copy link
Contributor

@ognis1205 ognis1205 commented May 9, 2023

Description

This pull request adds support for filtering to the to_pandas method of DeltaTable. The new filters argument allows
users to specify filtering criteria in a format that is compatible with pyarrow.compute.Expression.

  • Adding the filters argument to DeltaTable.to_pandas.
  • Adding the _filters_to_expression function to table module, which is based on this implementation, but with improved type consistency.
  • Based on the existing conventional unit tests, I did not add any additional test cases for this feature. Instead, I tested the feature on my local development environment.
  • Adding a unit test for the feature.

Related Issue(s)

Documentation

The DNF filter format.

@github-actions github-actions bot added the binding/python Issues for the Python package label May 9, 2023
@github-actions
Copy link

github-actions bot commented May 9, 2023

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ognis1205 ognis1205 changed the title Feat/add filters argument 1316 feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown (#1316) May 9, 2023
@ognis1205 ognis1205 changed the title feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown (#1316) feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown May 9, 2023
@ognis1205
Copy link
Contributor Author

ognis1205 commented May 9, 2023

https://github.com/delta-io/delta-rs/actions/runs/4926020116/jobs/8800992032?pr=1349#step:8:18

I will implement the fallback function of pyarrow.parquet.core.filters_to_expression for PyArrow 7.0.0.
I will add _filters_to_expression method.

@ognis1205 ognis1205 marked this pull request as draft May 9, 2023 17:32
@ognis1205
Copy link
Contributor Author

ognis1205 commented May 11, 2023

https://github.com/delta-io/delta-rs/actions/runs/4943095151/jobs/8837254340?pr=1349#step:5:545

I think the above error is not relevant to this PR.

Code example:

import deltalake

dt = deltalake.DeltaTable("../../rust/tests/data/COVID-19_NYT")
# "date > 2021-02-20" OR "state in ["Alabama", "Wyoming"]"
df = dt.to_pandas(filters=[[("date", ">", "2021-02-20")], [("state", "in", ["Alabama", "Wyoming"])]])
# "date > 2021-02-20" AND "state in ["Alabama", "Wyoming"]"
df = dt.to_pandas(filters=[("date", ">", "2021-02-20"), ("state", "in", ["Alabama", "Wyoming"])])

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like you to add a test at least for to_pyarrow_table() exercising the new argument. Ideally it should include a case for a conjunctionm one with a disjunction, and one single predicate.

python/deltalake/table.py Outdated Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
@ognis1205 ognis1205 marked this pull request as draft May 11, 2023 06:37
@ognis1205 ognis1205 marked this pull request as ready for review May 11, 2023 07:45
@ognis1205 ognis1205 requested a review from wjones127 May 11, 2023 07:46
@ognis1205
Copy link
Contributor Author

@wjones127

Sorry I had forgotten to mention you, but I believe I have now resolved all the issues. Would it be possible for you to review the PR again when you have a moment? I would really appreciate it. Thank you!

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, those tests look great!

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, those tests look great!

@wjones127 wjones127 merged commit 4c8e381 into delta-io:main May 12, 2023
@ognis1205 ognis1205 deleted the feat/add_filters_argument_1316 branch May 12, 2023 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown
2 participants