You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Request for hashing with xxhash64 in a merge operation (SQL context)
Hi,
I am using the deltalake port to Python with Polars and deltalake lib, currently in need of hashing some columns on a Merge operation with xxhash64 algorithm (via the update parameter) but the SQL context accepts only certain hashing functions, based on the rust code I traced down a list of accepted functions after realising it's based in datafusion expression API (derived from the rust library imports).
Is there any way or plans to expand this list with other hashing algorithms? Or to register an UDF in the python API that I can use in the SQL context of a Merge?
I saw the following rust crate is also imported in the project : https://crates.io/crates/twox-hash. And this contains the implementation for the hash algorithm in my request.
I have written a Polars plugin for that a while ago
Hi, thanks for the quick reply.
I can precompute the hashes, it's no problem, but since I am performing a Merge operation to a delta table and I need to hash the rows in the destination table, it would be more efficient to compute them in the update predicate of the merge, rather than reading the target table twice (one for computing the hashes, another for merging).
I was just curious if there was a way to create an UDF somehow or to expand the hashing algorithms list available.
Registering udfs is likely possible. You can take a look at Datafusion-python and then see if you can replicate the registering functionality into our code base, I'm open for a PR for this
From what I saw, your implementation in Python does not directly use datafusion, datafusion in your codebase is used on the rust level, before the port to Python.
Request for hashing with xxhash64 in a merge operation (SQL context)
Hi,
I am using the deltalake port to Python with Polars and deltalake lib, currently in need of hashing some columns on a Merge operation with xxhash64 algorithm (via the update parameter) but the SQL context accepts only certain hashing functions, based on the rust code I traced down a list of accepted functions after realising it's based in datafusion expression API (derived from the rust library imports).
Is there any way or plans to expand this list with other hashing algorithms? Or to register an UDF in the python API that I can use in the SQL context of a Merge?
I saw the following rust crate is also imported in the project : https://crates.io/crates/twox-hash. And this contains the implementation for the hash algorithm in my request.
Thanks for handling my request.
Use Case
Hashing with xxhash64 or xxh3 type algorithms : https://github.com/Cyan4973/xxHash
The text was updated successfully, but these errors were encountered: