Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema.hash / AST.hash should not use JSON.stringify #2719

Open
schickling opened this issue May 9, 2024 · 1 comment
Open

Schema.hash / AST.hash should not use JSON.stringify #2719

schickling opened this issue May 9, 2024 · 1 comment
Labels
enhancement New feature or request Needs Proposal This issue needs a plan that clarifies the finer details of how it could be implemented. schema

Comments

@schickling
Copy link

schickling commented May 9, 2024

What is the problem this feature would solve?

Currently Schema.hash / AST.hash is implemented using Hash.string(JSON.stringify(ast, null, 2)) which is inefficient and for larger schemas turns into a bottleneck or can fail all together with RangeError: Invalid string length

CleanShot 2024-05-09 at 16 24 09@2x

What is the feature you are proposing to solve the problem?

Ideally the hashing mechanism was implemented on a case by case basis on the AST level itself instead of relying on stringifying the AST.

Improving this implementation could also help to make hashes more stable across different Effect schema versions.

What alternatives have you considered?

No response

@schickling schickling added enhancement New feature or request schema labels May 9, 2024
@gcanti gcanti added the Needs Proposal This issue needs a plan that clarifies the finer details of how it could be implemented. label May 9, 2024
@mikearnaldi
Copy link
Member

mikearnaldi commented May 9, 2024

Hashing should use the same strategy we use to hash structs but we should probably add, in addition to the plain hash, something like a crc32. Reason being that hash isn't supposed to give hints about equality it is only supposed to bucket and optimise equality checks. If we instead need a high-probability indication of difference we need a checksum.

I can do some research on crc when I get back, for plain hash this should quite literally follow the same code we have for data classes / etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Needs Proposal This issue needs a plan that clarifies the finer details of how it could be implemented. schema
Projects
None yet
Development

No branches or pull requests

3 participants