Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bias in split selection? #228

Open
jlmelville opened this issue Oct 21, 2023 · 1 comment
Open

Minor bias in split selection? #228

jlmelville opened this issue Oct 21, 2023 · 1 comment

Comments

@jlmelville
Copy link
Collaborator

In euclidean_random_projection_split, this part of the code is picking two random points to form the hyperplane:

left_index = tau_rand_int(rng_state) % indices.shape[0]
right_index = tau_rand_int(rng_state) % indices.shape[0]
right_index += left_index == right_index
right_index = right_index % indices.shape[0]

right_index has an ever so slight bias to being chosen as 0, because when left_index == indices[-1] and right_index is also sampled as indices[-1], it will be "overflowed" to 0.

If right_index was selected as:

    right_index = tau_rand_int(rng_state) % (indices.shape[0] - 1)

then the bias is removed and there is no need for the right_index = right_index % indices.shape[0] -- I think?

This also affects the angular and sparse variants, but I assume this doesn't really matter. I am mainly asking to make sure I didn't miss something.

@lmcinnes
Copy link
Owner

No, I think you are correct. This should probably be fine for large indices, but when you get low in the tree perhaps it could actually come into play? Thanks for pointing this out; I'll have to see if I can run some small experiments and see if it is worth trying to fix this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants