Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp.unique runs forever #8326

Open
essoca opened this issue May 16, 2024 · 2 comments · May be fixed by #8328
Open

cp.unique runs forever #8326

essoca opened this issue May 16, 2024 · 2 comments · May be fixed by #8328
Labels
cat:performance Performance in terms of speed or memory consumption contribution welcome prio:high

Comments

@essoca
Copy link

essoca commented May 16, 2024

Description

In an attempt to measure the performance of cp.unique (following #8307), I noticed something very unpleasant: it doesn't return for large arrays.

I expect something comparable to Jax numbers:

import jax.numpy as jnp

N, M = 1_000_000, 10
arr = np.random.randint(0, 2, (N, M), dtype=np.uint8)
gpu_array = jnp.asarray(arr)

>>> %timeit jnp.unique(gpu_array, axis=0).block_until_ready()
28.9 ms ± 598 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

To Reproduce

First small arrays

import cupy as cp
from cupyx.profiler import benchmark

N, M = 32, 10
arr = cp.random.randint(0, 2, (N, M), dtype=cp.uint8)

>>> benchmark(cp.unique, (arr,), {'axis': 0}, n_repeat=100)
unique              :    CPU:  9660.600 us   +/- 70.012 (min:  9548.863 / max:  9925.441) us     GPU-0:  9665.085 us   +/- 70.181 (min:  9553.280 / max:  9930.688) us

Bigger array, but benchmarking any other function (e.g. cp.sum) to check that it returns:

N, M = 1_000_000, 10
arr = cp.random.randint(0, 2, (N, M), dtype=cp.uint8)

>>> benchmark(cp.sum, (arr,), {'axis': 0}, n_repeat=100)
sum                 :    CPU:    17.986 us   +/- 16.365 (min:    11.146 / max:   112.660) us     GPU-0: 19225.186 us   +/- 29.842 (min: 19187.712 / max: 19329.023) us

A single run with this size of cp.unique keeps running (after an hour, it was still running).

>>> benchmark(cp.unique, (arr,), {'axis': 0}, n_repeat=1)
...

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-6.5.0-1023-oem-x86_64-with-glibc2.35
Python Version               : 3.10.14
CuPy Version                 : 13.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.4
SciPy Version                : None
Cython Build Version         : 0.29.37
Cython Runtime Version       : None
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 12040
CUDA Driver Version          : 12040
CUDA Runtime Version         : 12040 (linked to CuPy) / 12040 (locally installed)
cuBLAS Version               : (available)
cuFFT Version                : 11201
cuRAND Version               : 10305
cuSOLVER Version             : (11, 6, 1)
cuSPARSE Version             : (available)
NVRTC Version                : (12, 4)
Thrust Version               : 200302
CUB Build Version            : 200200
Jitify Build Version         : <unknown>
cuDNN Build Version          : 8907
cuDNN Version                : 8907
NCCL Build Version           : 22105
NCCL Runtime Version         : 22105
cuTENSOR Version             : 20001
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA RTX A500 Laptop GPU
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:03:00.0

Additional Information

No response

@essoca essoca added the cat:bug Bugs label May 16, 2024
@kmaehashi
Copy link
Member

kmaehashi commented May 17, 2024

Thanks for the feedback @essoca, confirmed on my side as well. Support for axis argument in cupy.unique is relatively new (#6886 cc/ @andfoy) and looks like there is room for improvement, especially in the case that the length of the axis specified in the ndarray is large.

@kmaehashi kmaehashi added contribution welcome cat:performance Performance in terms of speed or memory consumption prio:high and removed cat:bug Bugs labels May 17, 2024
@andfoy
Copy link
Contributor

andfoy commented May 17, 2024

Thanks for the report! As @kmaehashi mentioned, this operation has a ton of room for improvement, I'll take a look for potential optimizations

@andfoy andfoy linked a pull request May 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption contribution welcome prio:high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants