-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When_matched_update causes records to be lost with explicit predicate #2158
Comments
thanks a lot! |
hmm actually I've just tested the newly released 0.15.2 and the bug described here is still present IMO. |
Yup, can confirm. Edit: But passing the table.merge(
pa.Table.from_pandas(new_df),
predicate="t.instance_id = s.instance_id",
source_alias="s",
target_alias="t",
).when_matched_update({"cost": "s.cost"}, predicate = "t.cost is null").execute() |
@sebdiem do you have some time to cross check the behavior with spark delta? |
yes will try to do that tomorrow |
The correct behavior is from 0.14. Staring At this I think the root cause is that predicate pushdown was enabled which pushes a filter of |
Oddly I can't reproduce the issue when using rust async fn test_merge_pushdowns() {
//See #2158
let schema = vec![
StructField::new(
"id".to_string(),
DataType::Primitive(PrimitiveType::String),
true,
),
StructField::new(
"cost".to_string(),
DataType::Primitive(PrimitiveType::Float),
true,
),
StructField::new(
"month".to_string(),
DataType::Primitive(PrimitiveType::String),
true,
),
];
let arrow_schema = Arc::new(ArrowSchema::new(vec![
Field::new("id", ArrowDataType::Utf8, true),
Field::new("cost", ArrowDataType::Float32, true),
Field::new("month", ArrowDataType::Utf8, true),
]));
let table = DeltaOps::new_in_memory()
.create()
.with_columns(schema)
.await
.unwrap();
let ctx = SessionContext::new();
let batch = RecordBatch::try_new(
Arc::clone(&arrow_schema.clone()),
vec![
Arc::new(arrow::array::StringArray::from(vec!["A", "B"])),
Arc::new(arrow::array::Float32Array::from(vec![Some(10.15), None])),
Arc::new(arrow::array::StringArray::from(vec![
"2023-07-04",
"2023-07-04",
])),
],
)
.unwrap();
let table = DeltaOps(table)
.write(vec![batch.clone()])
.with_save_mode(SaveMode::Append)
.await
.unwrap();
assert_eq!(table.version(), 1);
assert_eq!(table.get_files_count(), 1);
let batch = RecordBatch::try_new(
Arc::clone(&arrow_schema.clone()),
vec![
Arc::new(arrow::array::StringArray::from(vec!["A", "B"])),
Arc::new(arrow::array::Float32Array::from(vec![Some(12.15), Some(11.15)])),
Arc::new(arrow::array::StringArray::from(vec![
"2023-07-04",
"2023-07-04",
])),
],
).unwrap();
let source = ctx.read_batch(batch).unwrap();
let (table, _metrics) = DeltaOps(table)
.merge(source, "target.id = source.id and target.cost is null")
.with_source_alias("source")
.with_target_alias("target")
.when_matched_update(|insert| {
insert
.update("id", "target.id")
.update("cost", "source.cost")
.update("month", "target.month")
})
.unwrap()
.await
.unwrap();
let expected = vec![
"+----+-------+------------+",
"| id | cost | month |",
"+----+-------+------------+",
"| A | 10.15 | 2023-07-04 |",
"| B | 11.15 | 2023-07-04 |",
"+----+-------+------------+",
];
let actual = get_data(&table).await;
assert_batches_sorted_eq!(&expected, &actual);
} |
@Blajda weirdly enough using this predicate works: |
@Blajda I ran your rust test but it doesn't pass on my side:
|
# Description Fix broken test case with partitions - fixes #2158 --------- Co-authored-by: ion-elgreco <[email protected]>
Environment
Delta-rs version: 0.15.0
Binding: python
Environment:
Bug
What happened:
I have a table with several columns, among them
instance_id
andcost
.I want to update rows for which cost is null based on data contained in a pandas dataframe.
For this purpose I use a merge operation like this:
In 0.14.0 it worked perfectly fine and rows for which cost was already set stayed untouched while other were updated.
In 0.15.0 however the behavior is different: the rows with cost set were deleted, the other were updated.
What you expected to happen:
Same behavior as in 0.14.0
How to reproduce it:
Here is a minimal python script with the following dependencies:
More details:
The text was updated successfully, but these errors were encountered: