Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DataBricks SHA hash not being cast as binary #219

Open
UselessAlias opened this issue Dec 14, 2023 · 2 comments
Open

[BUG] DataBricks SHA hash not being cast as binary #219

UselessAlias opened this issue Dec 14, 2023 · 2 comments
Assignees
Labels
databricks Issues specific to Databricks

Comments

@UselessAlias
Copy link

UselessAlias commented Dec 14, 2023

Describe the bug
Noticed that the databricks SHA hashing algorithm is not casting the column to a binary type leaving it as a string.
Environment

dbt version: 1.6.0
automate_dv version: 0.10.1
Database/Platform: DataBricks

To Reproduce
Steps to reproduce the behavior:
Run the staging macro on DataBricks using the sha algorithm. The resulting column will be a string rather than binary.

Expected behavior
The hash keys should be binarys as expected by the DV methodology.

Additional context
Appears in the code that there is an existing cast binary implementation for DataBricks which is being used on the MD5 hashing algorithm. This just need to be applied to the SHA version as well.

Screenshots
Screenshot 2023-12-14 at 09 38 36
Screenshot 2023-12-14 at 09 38 40

AB#5354

@UselessAlias UselessAlias added the bug Something isn't working label Dec 14, 2023
@DVAlexHiggs
Copy link
Member

Hi! Thanks for this. We are aware of hashing issues and are planning a fix across the board. Our original design decisions on this weren't the best!

@DVAlexHiggs DVAlexHiggs added the databricks Issues specific to Databricks label Feb 21, 2024
@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Feb 27, 2024

Hi @UselessAlias Just to let you know, our next release will introduce opt-in (so we don't break things for everyone!) support for what we're calling "Native hashing" for this. For those interested, this will also provide the same resolution for BigQuery which is also handicapped by strings at the moment. Thanks for your patience

@DVAlexHiggs DVAlexHiggs removed the bug Something isn't working label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
databricks Issues specific to Databricks
Projects
None yet
Development

No branches or pull requests

2 participants