Replies: 8 comments 1 reply
-
The only thing i would add to what Tyler said is we will probably need to wrap the various low level write primitive actions with transaction apis, for example: let mut txn = table.new_transaction()
txn.add_file(vec!["a.parquet", "b.parquet", "c.parquet"])
txn.remove_file(...)
match txn.commit() {
Ok(()) => ,
Err(e) => {
// handle commit conflicts and other errors
}
} We could also use https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/DeltaLog.scala for some inspirations. |
Beta Was this translation helpful? Give feedback.
-
Some notes from our open development meeting
|
Beta Was this translation helpful? Give feedback.
-
I have started an early draft to work through some of the transaction API concepts and get more familiar with Feel free to checkout the wip. I'll start laying sketches on top of this soon. I'm using the StorageBackend api for now for writing logs. This may not be possible in the end. Just using it as a convenience. |
Beta Was this translation helpful? Give feedback.
-
@xianwill now has #84 propsed, and I have no idea how to close this discussion. 😒 |
Beta Was this translation helpful? Give feedback.
-
There are some additional follow-ups to this writer discussion beyond #84 that warrant further conversation. This particular one probably doesn't need a ton of discussion cuz I definitely get the idea and agree with the goal. Either way, we still need to design and implement the action holder approach suggested by @nevi-me and corroborated by @houqp. For my own tastes, I think I prefer keeping a factory function outside of the transaction itself for creating actions. The transaction itself does not know any details really that should be included in the action and therefore cannot really limit the number parameters required for creating the action from internal state (ineffectual helper - spirit-wise somewhat similar to @rtyler's points about the let mut txn = table.create_transaction(...)
let my_add = Actions::new_add(...)
txn.add_action(my_add)
... So basically, hang factory functions off of the Actions module and create an API on the DeltaTransaction to append these to an encapsulated vec of actions that should be committed. |
Beta Was this translation helpful? Give feedback.
-
Another item to discuss is the transaction log storage backend for S3. We need "fail if file exists already" semantics for S3 to run the optimistic concurrency loop w/ an s3 storage backend. I'm thinking a dynamodb guard should work well, but, in these tumultuous times (lol) of rusoto maintenance mode, I feel a little gun shy about going all in on implementation. |
Beta Was this translation helpful? Give feedback.
-
Given that this is already implemented, I recommend us moving the S3 log storage discussion to a new thread. I am guessing locking discussion is the equivalent of closing an issue? |
Beta Was this translation helpful? Give feedback.
-
Locking this discussion since we do have some preliminary code merged at this point in #248 |
Beta Was this translation helpful? Give feedback.
-
In a video discussion with @xianwill and @houqp, they were discussing the gists of what a low-level transaction API might look like for the Rust crate.
I wanted to open that discussion up a bit further, but also try out this new GitHub feature 😄
Below is my interpretation of what they were saying with some personal opinion thrown in:
The underlying parquet crate doesn't yet support writes, but at a low-level the deltalake crate should provide write-level primitives to the transaction log. For example, if I already have a parquet file created by something else, Is should be able to provide a path/URI (not sure which) in a
deltatable.add_file
type API, which creates a new transaction and gives me the transaction details back.@houqp mentioned reviewing the delta protocol for some of the specific semantic needed to be implemented here.
Perhaps one approach to continue this discussion would be for @xianwill to share some sketches of what these low-level API additions might look like.
Beta Was this translation helpful? Give feedback.
All reactions