Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: how delta lake transactions work #2089

Merged
merged 3 commits into from
Jan 19, 2024

Conversation

MrPowers
Copy link
Contributor

@nkarpov and I collaborated on this Delta Lake transactions post.

It's meant to give the basics on how transactions work and why they're a huge advantage of Delta Lakes.

@rtyler is giving a talk on transactions/concurrency soon. We're trying to set the stage with some foundational content first.


Delta Lake supports transactions which provide necessary reliability guarantees for production data systems.

Data lakes don’t provide transactions and this can cause nasty bugs and a bad user experience. Let’s look at a couple of scenarios when the lack of transactions cause a poor user experience:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it vanilla data lakes? Delta lake in my opinion is just a data lake with a metadata layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this. Delta Lakes def aren't data lakes 😱

See the Lakehouse paper. We usually call these "Lakehouse storage systems" or "open table formats".


Durability means that all transactions that are successfully completed will always remain persisted, even if there are service outages or program crashes.

Suppose you have a Delta table that’s persisted in Azure blog storage. The Delta table transactions that are committed will always remain available, even in these circumstances:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blog - blob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, lol. Thanks for reviewing!!

@MrPowers MrPowers requested a review from ion-elgreco January 19, 2024 11:29
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MrPowers!

@ion-elgreco ion-elgreco merged commit 5eda27c into delta-io:main Jan 19, 2024
23 checks passed
RobinLin666 pushed a commit to RobinLin666/delta-rs that referenced this pull request Feb 2, 2024
@nkarpov and I collaborated on this Delta Lake transactions post.

It's meant to give the basics on how transactions work and why they're a
huge advantage of Delta Lakes.

@rtyler is giving a talk on transactions/concurrency soon. We're trying
to set the stage with some foundational content first.

---------

Co-authored-by: Ion Koutsouris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants