nested loop sequential or parallelization runtime #188

lyelibi · 2022-06-03T05:09:24Z

I have recently started using Pynndescent to create nearest neighbor graphs. It performs much faster and is more stable than sklearn nearest neighbor graph alternative.
The problem I am trying to solve is about scale: I generate a data-set whose columns are used to compute nearest neighbor graphs which means that for a 500 x 50 matrix (the example below) I have to compute 50*(50-1)/2 nearest neighbor graphs which is 1225 calls to pynndescent. This takes 9 seconds on my machine which is very limiting because I would like to do this for significantly larger data-sets (i.e. N = 1000 as opposed to 50 ).

Are there ways to optimize and do something else than a for loop? I serialize instead of the naive nested loop but python threading and multiprocessing haven't produced better result than the solution below.

data = np.random.normal(0,1, (500,50) )

  idx = list(itertools.combinations( range(data.shape[0]),2))

  count = len(idx)

  s = np.empty( N , dtype=np.float16 )


    for i in range(count):
        s[i] = NNDescent(data[:,idx[i]], metric='euclidean', n_neighbors = 5).neighbor_graph[1].sum()

The text was updated successfully, but these errors were encountered:

lmcinnes · 2022-06-04T18:10:35Z

Off the top of my head I don't see any obvious ways to improve this significantly. There is just a lot of computation work to be done, and I'm not sure there are ways around that. You would have to look into deeper algorithmic approaches to cut down on work I think.

lyelibi changed the title ~~nested loop sequential or parrelization runtime~~ nested loop sequential or parallelization runtime Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nested loop sequential or parallelization runtime #188

nested loop sequential or parallelization runtime #188

lyelibi commented Jun 3, 2022

lmcinnes commented Jun 4, 2022

nested loop sequential or parallelization runtime #188

nested loop sequential or parallelization runtime #188

Comments

lyelibi commented Jun 3, 2022

lmcinnes commented Jun 4, 2022