Skip to content

The implementation of sdhash, the algorithm to calculate similarity digests, rewritten in pure go language 🐹

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
SDHASH_LICENSE
Notifications You must be signed in to change notification settings

eciavatta/sdhash

sdhash

Tests codecov Go Report Card GoDoc Release Language License

sdhash is a tool that processes binary data and produces similarity digests using bloom filters. Two binary files with common parts produces two similar digests. sdhash is able to compare the similarity digests to produce a score. A score close to 0 means that two file are very different, a score equals to 100 means that two file are equal.

Features

  • calculate similarity digests of many files in a short time
  • compare a large amount of digests using precalculated indexes
  • the comparison can also be made during the digest process
  • same results of original sdhash with similar performance, but entirely rewritten in go language

Getting started

The sdhash package is available as binaries and as a library.

Binaries

The binaries for all platforms are available on the Releases page.

Library

  1. Install sdhash package with the command below
$ go get -u github.com/eciavatta/sdhash
  1. Import it in your code and start play around
package main

import (
	"fmt"
	"github.com/eciavatta/sdhash"
)

func main() {
	factoryA, _ := sdhash.CreateSdbfFromFilename("a.bin")
	sdbfA := factoryA.Compute()

	factoryB, _ := sdhash.CreateSdbfFromFilename("b.bin")
	sdbfB := factoryB.Compute()

	fmt.Println(sdbfA.String())
	fmt.Println(sdbfB.String())
	fmt.Println(sdbfA.Compare(sdbfB))
}

Documentation

The library documentation is published at pkg.go.dev/github.com/eciavatta/sdhash. How sdhash works is described in this paper, and here you can find a tutorial of the original version of sdhash.

License

sdhash is originally created by Vassil Roussev and Candice Quates and is licensed under Apache-2.0 License. The implementation in golang was made by Emiliano Ciavatta and is also licensed under Apache-2.0 License.

About

The implementation of sdhash, the algorithm to calculate similarity digests, rewritten in pure go language 🐹

Topics

Resources

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
SDHASH_LICENSE

Stars

Watchers

Forks