-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta table expect a "__index_level_0__" column #1698
Comments
This is because Pandas has a lovely index.... So, the main issue is that on the first write the index column stayed in the data while writing to parquet. What PyArrow version are you using? I can see |
I am using PyArrow 12.0.0 |
Same issue here. @ion-elgreco suggestion fixed it data = pa.Table.from_pandas(data, preserve_index=False)
write_deltalake(
table_or_uri=s3_endpoint,
data=data,
mode='append',
storage_options=storage_options
) |
Thanks a lot @ion-elgreco and @titowoche30, it worked! After I had an issue on the first write related to the data schema but I fixed my data schema in |
Environment
Delta-rs version: 0.10.2
Binding: Python 3.9.17
Environment: local
Bug
What happened:
I am trying to pull new data (which contains text) from a delta table in my bucket A, apply some transformations to it (removing urls, removing hashtags, …) and finally load transformed data into a delta table in my bucket B.
The first time I ran this pipeline, it worked perfectly fine. Then I inserted new data in my delta table (bucket A). The second time, it failed and displayed the following error:
Apparently, a column named "index_level_0" is required but it is not a column defined by me.
What you expected to happen:
I expected my transformed data to be stored in my delta table (bucket B) without a problem.
How to reproduce it:
Here is my Python script to reproduce it:
More details:
The text was updated successfully, but these errors were encountered: