-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Microsoft Fabric / OneLake #4573
Conversation
added additional onelake urls to object_store/src/azure/mod.rs for both https and abfss this is required to fix Issue 1418 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR appears to add support for a mix of different URLs:
- https://account.blob.fabric.microsoft.com
- https://account.dfs.fabric.microsoft.com
- abfs://account.dfs.core.windows.net
- abfs://account.dsf.fabric.microsoft.com
Could we possibly get tests for all of these, and ideally a link to some docs that explains why there are 4 different URI schemes and what the differences between them are?
@tustvold Unit Tests added to validate the URL schemes that were introduced to enable support for OneLake Fabric URLs. https://account.blob.fabric.microsoft.com/ OneLake supports both Blob and DFS endpoints, as certain tools opt for utilizing Blob APIs by executing a string replacement, substituting '.dfs' with '.blob'.Also as OneLake supports the ABFS Driver, it also provides support for endpoints using the ABFS format. Link to Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly struggling to follow how this is supposed to work, and I don't have a Microsoft fabric, let alone a onelake endpoint setup to test this. Reading the linked doc it would suggest a fundamentally different URI scheme from what this crate expects, gone is the notion of a container and account name, instead replaced with some workspace notion?
Is there some way to interact with onelake using the normal blob storage APIs? Is this documented somewhere? Ideally we would be able to do something similar to how we support ADLS Gen 2, i.e. just treat it as normal blob storage as that is all this crate and its users are concerned with.
Ultimately it would be my dream if Azure just provided a standard blob storage API that worked across all its various products, without strange esoterica like ADLS Gen 2 returning directories (?!!) with no way to turn this off that is compatible with other blob endpoints. The closer we can get to such an API the happier I will be 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thank you for sticking with this, I'm starting to follow how this is supposed to work. The entire onelake workspace is effectively mapped as a "container", with the top-level layers of the hierarchy not being real parts of blob storage, and only responding to HEAD requests. This does feel rather perverse though, so perhaps I am misunderstanding something...
I've left some comments assuming the above is the correct interpretation, but feel free to correct me if I am mistaken. I have limited bandwidth to review stuff so I apologise if I'm asking stupid questions.
@@ -0,0 +1,114 @@ | |||
use async_trait::async_trait; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably remove this before merge, but thank you for creating it to show how you are expecting to interact with this system
.build() | ||
.unwrap()); | ||
|
||
let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit that I found rather surprising, I would have naively thought that the corollary for a container would be the files within an item, i.e. the user would just specify SampleCustomerList.csv
in the path, with everything else provided at construction time of the builder.
This does appear to be what the docs suggest, so 🤷 To confirm if I were to perform a List of 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/
it would return the full path, i.e. 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv
and not SampleCustomerList.csv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct ,when Performing a List on 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/, ObjectMeta returned will contain the full path
ObjectMeta { location: Path { raw: "17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv" }, last_modified: 2023-08-10T18:31:52Z, size: 11, e_tag: Some("0x8DB99D01129EA9D") }
I took the liberty of pushing some minor changes to prepare this for merge |
@tustvold - When will these changes to the crate be published? I noticed that the last build was released on June 6th https://crates.io/crates/object_store/versions |
I intend to cut a RC today or tomorrow, with a view to publishing a release later this week |
Thanks - I have another PR on delta-rs that needs to be verified with the latest changes to object store crate dependency referenced in delta-rs, This is required to fix delta-io/delta-rs#1418. |
@tustvold - Any update on the release for changes to object_store crate |
The RC was cut on Tuesday, Apache mandates at least a 3 day voting period, and so the release should be published today |
No description provided.