Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half-precision float vector metrics #4122

Merged
merged 15 commits into from May 13, 2024
Merged

Conversation

TheQuantumFractal
Copy link

@TheQuantumFractal TheQuantumFractal commented Apr 26, 2024

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

I built out a SIMD implementation with testing for Neon, AVX2, SSE2 on euclidean, manhattan, and dot similarity. Something to note is that float16 SIMD operations are not supported on most ISAs (ARM32/64 processors are able to handle it, and AVX512 recently announced some hardware support but most machines on AVX2 and SSE2 do not support it). F16C is an x86 instruction set extension supported on most x86 modern machines that supports conversion between half- and single-precision floating point formats. Essentially, to run the metrics on AVX2 or SSE2, f16 vectors need to be converted to f32 then processed with f32 SIMD accordingly. My implementations are as such. I also wrote out a separate C / assembly file that enables Neon f16 SIMD operations since Rust does not currently support ARM f16 SIMD operations.

The AVX2 / SSE2 SIMD was tested on a Intel(R) Xeon(R) CPU while the Neon SIMD was tested on an Apple M1 Pro.

As for cosine similarity, the current cosine similarity preprocess step accepts float32 DenseVectors and simply normalizes them. You can similarly normalize the float16 vectors by computing dot product between the vector and itself using the dot similarity SIMD implementation. The actual metric after preprocessing would use the same SIMD dot similarity implementation.

/claim #4110

Copy link

algora-pbc bot commented Apr 26, 2024

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe/Alipay.

@TheQuantumFractal TheQuantumFractal changed the base branch from master to dev April 26, 2024 17:25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel too confident about including ASM files into the project as-is.
@TheQuantumFractal could you please elaborate why you decided to do it like this instead of directly linking C?

FYI in qunatizations repo we have an example https://github.com/qdrant/quantization/tree/master/quantization/cpp

If you think ASM is strictly necessary, could you please include an instruction of how to generate it from C

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there isn't really a reason to include asm files. Directly linking C is a better solution. I can just set up linking for neon in qdrant/lib/segment then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would help, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this example because here we solved cross-compilation issues (for instance, build on x64 host binary for arm target)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added build.rs to link the C file.

Copy link
Contributor

@IvanPleshkov IvanPleshkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be nice to add f16 scoring benchmarks. You can do it here where byte scoring defined
https://github.com/qdrant/qdrant/blob/dev/lib/segment/benches/metrics.rs

#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx")
&& is_x86_feature_detected!("fma")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if f16c is supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion of how to make this check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_x86_feature_detected!("f16c")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved these.

}
}

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if f16c is supported

fn similarity(v1: &[VectorElementTypeHalf], v2: &[VectorElementTypeHalf]) -> ScoreType {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if f16c is supported


#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
if is_x86_feature_detected!("sse") && v1.len() >= MIN_DIM_SIZE_SIMD {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if f16c is supported

fn similarity(v1: &[VectorElementTypeHalf], v2: &[VectorElementTypeHalf]) -> ScoreType {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if f16c is supported

float16x8_t sum2 = vdupq_n_f16(0.0f);
float16x8_t sum3 = vdupq_n_f16(0.0f);
float16x8_t sum4 = vdupq_n_f16(0.0f);
uint32_t i = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to define iterator here instead of for (int i, ...)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the iterator into the for loop

float16x8_t sub3 = vdupq_n_f16(0.0f);
float16x8_t sum4 = vdupq_n_f16(0.0f);
float16x8_t sub4 = vdupq_n_f16(0.0f);
uint32_t i = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inside for definition?

float16x8_t sum2 = vdupq_n_f16(0.0f);
float16x8_t sum3 = vdupq_n_f16(0.0f);
float16x8_t sum4 = vdupq_n_f16(0.0f);
uint32_t i = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inside for definition?

float32_t tmp = 0.0f;
for (i=0; i < (blockSize % 32); i++) {
tmp = (*pSrcA - *pSrcB);
manhattanDistance += tmp > 0 ? tmp : -tmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not abs instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the arm f16 abs operation now

#[target_feature(enable = "avx")]
#[target_feature(enable = "fma")]
#[target_feature(enable = "f16c")]
pub(crate) unsafe fn euclid_similarity_avx(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment for all namings. euclid_similarity_avx is already presented. Please, rename euclid_similarity_avx into avx_euclid_similarity_half like it was named for byte type:
https://github.com/qdrant/qdrant/blob/dev/lib/segment/src/spaces/metric_uint/avx2/euclid.rs#L9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this please for all simd functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@IvanPleshkov
Copy link
Contributor

@TheQuantumFractal rebase please to the latest dev, CI is red

@generall
Copy link
Member

Hey @TheQuantumFractal thanks a lot for the contribution! We will take it from here and finish the integration as a separate PR.

@generall generall merged commit c230a48 into qdrant:dev May 13, 2024
16 of 17 checks passed
@TheQuantumFractal
Copy link
Author

Sounds good! Happy to help :)

This was referenced May 16, 2024
@generall generall mentioned this pull request May 16, 2024
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants