Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Max Partitions Arg to Write #1242

Merged
merged 4 commits into from
Mar 26, 2023

Conversation

ColeMurray
Copy link
Contributor

@ColeMurray ColeMurray commented Mar 25, 2023

Description

Add optional max partitions arg to be passed to pyArrow dataset.write_dataset.

Related Issue(s)

This resolves an issue where users are unable to write more than 1024 partitions and have no way of overriding the parameter

  File "/lib/python3.8/site-packages/deltalake/writer.py", line 293, in write_deltalake
    ds.write_dataset(
  File "/lib/python3.8/site-packages/pyarrow/dataset.py", line 900, in write_dataset
    _filesystemdataset_write(
  File "pyarrow/_dataset.pyx", line 2479, in pyarrow._dataset._filesystemdataset_write
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Fragment would be written into 1246 partitions. This exceeds the maximum of 1024

Documentation

https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html

@github-actions github-actions bot added the binding/python Issues for the Python package label Mar 25, 2023
wjones127
wjones127 previously approved these changes Mar 26, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @ColeMurray. Could you add a unit test. This could just set max_partitions=1 and verify the standard sample dataset fails.

python/deltalake/writer.py Outdated Show resolved Hide resolved
python/deltalake/writer.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @ColeMurray!

@wjones127 wjones127 merged commit 0277e09 into delta-io:main Mar 26, 2023
@ColeMurray ColeMurray deleted the add_max_partitions_arg branch March 27, 2023 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants