[feat] Add binary & scalar embedding quantization support to Transformers.js #681

jonathanpv · 2024-04-04T20:13:43Z

Feature request

Add binary & scalar quantization support

We should extract the algorithm to quantize the embeddings from the PR below from sentence-transformers and add it transformers.js so the feature-extraction pipeline can support binary vector search

Either adding a quantize-output or binary-output to the pipeline or we can have helper method that quantizes the tensor so this solution can be applied to other parts of the codebase.

UKPLab/sentence-transformers#2549

Motivation

Given performance gains for binary vector embeddings additional quantization helper methods can be useful for client-side vector search to reduce memory footprint

Your contribution

I plan on making a PR, but note, I am slow to open source world still and learning the transformers.js repo as fast as I can.

@xenova may be able to more quickly parse the repo from above and add the methods along with test cases in the style of the repo quicker than me.

However I am working on a poc for using vector embeddings on a side project so I can contribute that work and repo or readme when I finish it.

jonathanpv · 2024-04-04T20:49:50Z

Seems like numpy equivalent in js doesn't exist

I wonder how transformers.js does its math calculations

jonathanpv · 2024-04-04T21:12:29Z

have we considered adding numjs to transformers.js so the translation could be more 1:1 or do we prefer pulling in math functions on a as-needed basis?

xenova · 2024-04-04T23:04:33Z

Hi there 👋 I have already worked on this a bit, and might be a useful addition to the feature-extraction pipeline.

Here's some example code which shows how you can achieve this in javascript:

import { pipeline, Tensor } from "@xenova/transformers";

function hamming_distance(arr1, arr2) {
    if (arr1.length !== arr2.length) {
        throw new Error("Typed arrays must have the same length");
    }

    let distance = 0;

    // Iterate over each byte in the typed arrays
    for (let i = 0; i < arr1.length; ++i) {
        // XOR the bytes to find differing bits
        let xorResult = arr1[i] ^ arr2[i];

        // Count set bits in the XOR result using Brian Kernighan's Algorithm
        while (xorResult) {
            ++distance;
            xorResult &= xorResult - 1;
        }
    }

    return distance;
}

function quantize_embeddings(tensor, precision) {
    if (tensor.dims.length !== 2) {
        throw new Error("The tensor must have 2 dimensions");
    }
    if (tensor.dims.at(-1) % 8 !== 0) {
        throw new Error("The last dimension of the tensor must be a multiple of 8");
    }
    if (!['binary', 'ubinary'].includes(precision)) {
        throw new Error("The precision must be either 'binary' or 'ubinary'");
    }
    // Create a typed array to store the packed bits
    const inputData = tensor.data;

    const signed = precision === 'binary';
    const cls = signed ? Int8Array : Uint8Array;
    const dtype = signed ? 'int8' : 'uint8';
    const outputData = new cls(inputData.length / 8);

    // Iterate over each number in the array
    for (let i = 0; i < inputData.length; ++i) {
        // Determine if the number is greater than 0
        const bit = inputData[i] > 0 ? 1 : 0;

        // Calculate the index in the typed array and the position within the byte
        const arrayIndex = Math.floor(i / 8);
        const bitPosition = i % 8;

        // Pack the bit into the typed array
        outputData[arrayIndex] |= bit << (7 - bitPosition);
        if (signed && bitPosition === 0) {
            outputData[arrayIndex] -= 128;
        }
    };

    return new Tensor(dtype, outputData, [tensor.dims[0], tensor.dims[1] / 8]);
}

const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    quantized: false,
});

const texts = ['hello', 'hi', 'banana'];
const output = await embedder(texts, {
    normalize: true,
    pooling: 'mean',
});
const embeddings = quantize_embeddings(output, 'ubinary').tolist();

const pairs = [[0, 1], [0, 2], [1, 2]];
for (const [i, j] of pairs) {
    console.log(`${texts[i]} <-> ${texts[j]}`, '|', hamming_distance(embeddings[i], embeddings[j]));
}

outputs:

hello <-> hi | 86
hello <-> banana | 165
hi <-> banana | 163

indicating higher similarity between hello and hi than hello and banana, for example

jonathanpv · 2024-04-08T01:24:58Z

@xenova @ashvardanian

So when I was testing the code I went ahead and made a front end to play around with it

I noticed the embeddings ended up being int8 and was wondering if that was as-designed from the quantize_embeddings function despite choosing ubinary or binary

Curious how the algo works under-the-hood. I tried learning it from reading the code but couldn't understand it

jonathanpv · 2024-04-08T01:29:55Z

I can separate out the commits so it's easier to parse, just realized how large the changes diff was lol 5k lines

ashvardanian · 2024-04-08T01:40:22Z

@jonathanpv, I find it reasonable that the outputs are 8-bit integers, as long as the cosine distance between them makes sense. Just make sure to use a proper library to compute those, as NumPy and SciPy don't support mixed precision and will overflow. SimSIMD should work fine and is available for JS as well 😉

jonathanpv · 2024-04-08T01:51:12Z

@jonathanpv, I find it reasonable that the outputs are 8-bit integers, as long as the cosine distance between them makes sense. Just make sure to use a proper library to compute those, as NumPy and SciPy don't support mixed precision and will overflow. SimSIMD should work fine and is available for JS as well 😉

Oh wow, didn't realize simsimd also had these functions, I can see why you implemented them. I'll try this out after I finish this other project. Will be nice for me to benchmark in the browser context.

jonathanpv added the enhancement New feature or request label Apr 4, 2024

jonathanpv mentioned this issue Apr 5, 2024

[demo] Add binary & scalar embedding quantization #683

Open

This was referenced Apr 8, 2024

[feat] add binary quantization support vector search askorama/orama#687

Open

Add binary embedding quantization support to FeatureExtraction pipeline #691

Merged

xenova closed this as completed in #691 Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add binary & scalar embedding quantization support to Transformers.js #681

[feat] Add binary & scalar embedding quantization support to Transformers.js #681

jonathanpv commented Apr 4, 2024 •

edited

jonathanpv commented Apr 4, 2024

jonathanpv commented Apr 4, 2024 •

edited

xenova commented Apr 4, 2024

jonathanpv commented Apr 8, 2024

jonathanpv commented Apr 8, 2024

ashvardanian commented Apr 8, 2024

jonathanpv commented Apr 8, 2024

[feat] Add binary & scalar embedding quantization support to Transformers.js #681

[feat] Add binary & scalar embedding quantization support to Transformers.js #681

Comments

jonathanpv commented Apr 4, 2024 • edited

Feature request

Motivation

Your contribution

jonathanpv commented Apr 4, 2024

jonathanpv commented Apr 4, 2024 • edited

xenova commented Apr 4, 2024

jonathanpv commented Apr 8, 2024

jonathanpv commented Apr 8, 2024

ashvardanian commented Apr 8, 2024

jonathanpv commented Apr 8, 2024

jonathanpv commented Apr 4, 2024 •

edited

jonathanpv commented Apr 4, 2024 •

edited