Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice error using mac M1-max ARM #218

Open
thegodone opened this issue May 10, 2023 · 6 comments
Open

Slice error using mac M1-max ARM #218

thegodone opened this issue May 10, 2023 · 6 comments

Comments

@thegodone
Copy link

thegodone commented May 10, 2023

I try the code on a large dataset 200k x 2.5k, using last version v0.5.10 with ever dense or sparse dataset, I have an error:

My code:

´´´
index = pynndescent.NNDescent(crs_test, metric='cosine')
´´´
it run for 10,20 secs than got this error:


ValueError Traceback (most recent call last)
File :1

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/pynndescent_.py:804, in NNDescent.init(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
793 print(ts(), "Building RP forest with", str(n_trees), "trees")
794 self._rp_forest = make_forest(
795 data,
796 n_neighbors,
(...)
802 self._angular_trees,
803 )
--> 804 leaf_array = rptree_leaf_array(self._rp_forest)
805 else:
806 self._rp_forest = None

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1097, in rptree_leaf_array(rp_forest)
1095 def rptree_leaf_array(rp_forest):
1096 if len(rp_forest) > 0:
-> 1097 return np.vstack(rptree_leaf_array_parallel(rp_forest))
1098 else:
1099 return np.array([[-1]])

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest)
1088 def rptree_leaf_array_parallel(rp_forest):
-> 1089 result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
1090 joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
1091 )
1092 return result

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:1098, in Parallel.call(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())

File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value

File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/_parallel_backends.py:620, in SafeFunction.call(self, *args, **kwargs)
618 def call(self, *args, **kwargs):
619 try:
--> 620 return self.func(*args, **kwargs)
621 except KeyboardInterrupt as e:
622 # We capture the KeyboardInterrupt and reraise it as
623 # something different, as multiprocessing does not
624 # interrupt processing for a KeyboardInterrupt
625 raise WorkerInterrupt() from e

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in BatchedCalls.call(self)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in (.0)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]

ValueError: cannot assign slice from input of different size

@thegodone
Copy link
Author

I found the problem, I did not pass the distance

@jamestwebber
Copy link
Collaborator

jamestwebber commented Jun 15, 2023

I just got this same error on an x86 machine (n2d-highmem-8 GCP VM) and I'm unclear on what you needed to do to fix this. In any case I think this is a bug, as additional arguments shouldn't be necessary.

edit: Of course, as soon as I comment it starts mysteriously working...was failing consistently before. I wonder if I had some bad version cached or something

@lmcinnes
Copy link
Owner

I agree this is odd, and I'll try to keep a lookout for a reproducer.

@lmcinnes lmcinnes reopened this Jun 15, 2023
@jamestwebber
Copy link
Collaborator

I think I have a reproducer but not sure how to share it. It seems completely data specific: I got this error with np.sqrt(X) but not X (and I don't think it's a dtype issue).

@jamestwebber
Copy link
Collaborator

jamestwebber commented Jun 16, 2023

I have a sporadic reproducer with a fairly small array (1.8M on disk, saved as numpy npz). It seems like this problem was introduced in a recent update. My suspicion is that this comes from something at the edges of giving the rows to n_jobs and an uneven split.

pynndescent_bug_np.npz.zip

edit: The above array seems to fail consistently only when passed through sqrt but right now I don't want to figure out why that is 🙃

@lmcinnes
Copy link
Owner

lmcinnes commented Aug 1, 2023

So it was quirky. There was some code added to bail when the tree splitting was not working well and avoid excess depth. Unfortunately that meant that, in rare cases, the size of a leaf could exceed the leaf_size set. This made things not match up when building leaf arrays at the end, because we expected things to match the leaf size. Now we have a max_leaf_size, and expand things in those rare cases. In theory this could blow up terribly for bad data by consuming ungodly amounts of memory, but that's a very rare case indeed, and I'm not sure there is any way to fix it anyway. The best answer in that case is simply to increase the leaf size in the NNDescent params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants