[Bug] change h_input_length_buf_ before synchronization #1584

mengmeexix · 2024-05-11T07:25:38Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

Here, the value of h'input'length'buf_ is modified on the CPU side, but there was a GPU operator still using this value for calculation. Do we not need to synchronize before the CPU side modification to ensure that the GPU operator has used up the data？

Reproduction

python ./benchmark/profile_throughput_audio.py llama model

Environment

sys.platform: linux
Python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA A100 80GB PCIe
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.66
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.16.2+cu121
LMDeploy: 0.3.0+
transformers: 4.38.2
gradio: 3.50.2
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.1.0

Error traceback

No response

lzhangzz · 2024-05-11T08:25:23Z

Do we not need to synchronize before the CPU side modification to ensure that the GPU operator has used up the data？

Which GPU operator are you referring to?

mengmeexix · 2024-05-13T02:51:40Z

Do we not need to synchronize before the CPU side modification to ensure that the GPU operator has used up the data？

Which GPU operator are you referring to?

Thank you for your reply. I just checked the code and it seems that lmdeploy did not manipulate this data on the GPU. We have added another kernel ourselves, so for our kernel, we can only modify the data after it has been executed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] change h_input_length_buf_ before synchronization #1584

[Bug] change h_input_length_buf_ before synchronization #1584

mengmeexix commented May 11, 2024

lzhangzz commented May 11, 2024

mengmeexix commented May 13, 2024

[Bug] change h_input_length_buf_ before synchronization #1584

[Bug] change h_input_length_buf_ before synchronization #1584

Comments

mengmeexix commented May 11, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lzhangzz commented May 11, 2024

mengmeexix commented May 13, 2024