Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No CUDA toolset found #119

Open
c469591 opened this issue Jan 15, 2024 · 5 comments
Open

No CUDA toolset found #119

c469591 opened this issue Jan 15, 2024 · 5 comments
Labels
question Further information is requested

Comments

@c469591
Copy link

c469591 commented Jan 15, 2024

Question Details

Hello, I encountered an error while using cmake. My system is Windows 10 with Python 3.11 and NVIDIA 3060. Below is the content of the error report.
And I have correctly installed CUDA.

(llm) I:\llm>cmake -S . -B build -DLLAMA_CUBLAS=ON
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045.
-- cuBLAS found
CMake Error at C:/Program Files/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:529 (message):
  No CUDA toolset found.
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  C:/Program Files/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
  C:/Program Files/CMake/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:135 (CMAKE_DETERMINE_COMPILER_ID)
  CMakeLists.txt:258 (enable_language)


-- Configuring incomplete, errors occurred!

Additional Context

windows10
python3.11
NVIDIA3060
Today's cloned repository
Today I installed the latest stable version of CMake.

@c469591 c469591 added the question Further information is requested label Jan 15, 2024
@aoguai
Copy link

aoguai commented Jan 19, 2024

I encountered the same issue as well.

Environment Information:

  • Operating System: Windows 10
  • Python Version: 3.10
  • CUDA Version: 12.3
  • NVIDIA GPU: NVIDIA 3050
  • CMake Version: 3.28.0-rc1

Error during PowerInfer Setup:

  1. Using CMake:

    • Cloned PowerInfer repository:

      git clone https://github.com/bobozi-cmd/PowerInfer
      cd PowerInfer
    • Installed dependencies:

      pip install -r requirements.txt
    • Ran CMake configuration:

      cmake -S . -B build -DLLAMA_CUBLAS=ON
    • Error Encountered:

       CMake Error at D:/yy/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:529 (message):
         No CUDA toolset found.
       Call Stack (most recent call first):
         D:/yy/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
         D:/yy/CMake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
         D:/yy/CMake/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:135 (CMAKE_DETERMINE_COMPILER_ID)
         CMakeLists.txt:258 (enable_language)
       
       
       -- Configuring incomplete, errors occurred!
      
      
  2. Using w64devkit:

    • Downloaded the latest Fortran version of w64devkit.

    • Executed w64devkit:

      w64devkit.exe
    • Navigated to PowerInfer folder:

      cd PowerInfer
    • Attempted to build using make.

    • Error Encountered:

       In file included from ggml.h:217,
                        from ggml-impl.h:3,
                        from ggml.c:4:
       atomic_windows.h: In function '__msvc_xchg_i8':
       atomic_windows.h:103:12: error: implicit declaration of function '_InterlockedExchange8'; did you mean '_InterlockedExchange'? [-Werror=implicit-function-declaration]
         103 |     return _InterlockedExchange8(addr, val);
             |            ^~~~~~~~~~~~~~~~~~~~~
             |            _InterlockedExchange
       atomic_windows.h: In function '__msvc_xchg_i16':
       atomic_windows.h:107:12: error: implicit declaration of function '_InterlockedExchange16'; did you mean '_InterlockedExchange'? [-Werror=implicit-function-declaration]
         107 |     return _InterlockedExchange16(addr, val);
             |            ^~~~~~~~~~~~~~~~~~~~~~
             |            _InterlockedExchange
       atomic_windows.h: In function '__msvc_xchg_i32':
       atomic_windows.h:111:33: warning: passing argument 1 of '_InterlockedExchange' from incompatible pointer type [-Wincompatible-pointer-types]
         111 |     return _InterlockedExchange(addr, val);
             |                                 ^~~~
             |                                 |
             |                                 volatile int *
       In file included from D:/yy/w64devkit/x86_64-w64-mingw32/include/winnt.h:27,
                        from D:/yy/w64devkit/x86_64-w64-mingw32/include/minwindef.h:163,
                        from D:/yy/w64devkit/x86_64-w64-mingw32/include/windef.h:9,
                        from D:/yy/w64devkit/x86_64-w64-mingw32/include/windows.h:69,
                        from atomic_windows.h:29:
       D:/yy/w64devkit/x86_64-w64-mingw32/include/psdk_inc/intrin-impl.h:1714:50: note: expected 'volatile long int *' but argument is of type 'volatile int *'
        1714 | __LONG32 _InterlockedExchange(__LONG32 volatile *Target, __LONG32 Value) {
             |                                                  ^
       atomic_windows.h: In function '__msvc_cmpxchg_i8':
       atomic_windows.h:186:12: error: implicit declaration of function '_InterlockedCompareExchange8'; did you mean '_InterlockedCompareExchange'? [-Werror=implicit-function-declaration]
         186 |     return _InterlockedCompareExchange8((__int8 volatile*)addr, newval, oldval);
             |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
             |            _InterlockedCompareExchange
       atomic_windows.h: In function '__msvc_cmpxchg_i32':
       atomic_windows.h:194:40: warning: passing argument 1 of '_InterlockedCompareExchange' from incompatible pointer type [-Wincompatible-pointer-types]
         194 |     return _InterlockedCompareExchange((__int32 volatile*)addr, newval, oldval);
             |                                        ^~~~~~~~~~~~~~~~~~~~~~~
             |                                        |
             |                                        volatile int *
       D:/yy/w64devkit/x86_64-w64-mingw32/include/psdk_inc/intrin-impl.h:1659:57: note: expected 'volatile long int *' but argument is of type 'volatile int *'
        1659 | __LONG32 _InterlockedCompareExchange(__LONG32 volatile *Destination, __LONG32 ExChange, __LONG32 Comperand) {
             |                                                         ^
       atomic_windows.h: In function '__msvc_xadd_i8':
       atomic_windows.h:279:12: error: implicit declaration of function '_InterlockedExchangeAdd8'; did you mean '_InterlockedExchangeAdd'? [-Werror=implicit-function-declaration]
         279 |     return _InterlockedExchangeAdd8(addr, val);
             |            ^~~~~~~~~~~~~~~~~~~~~~~~
             |            _InterlockedExchangeAdd
       atomic_windows.h: In function '__msvc_xadd_i16':
       atomic_windows.h:283:12: error: implicit declaration of function '_InterlockedExchangeAdd16'; did you mean '_InterlockedExchangeAdd'? [-Werror=implicit-function-declaration]
         283 |     return _InterlockedExchangeAdd16(addr, val);
             |            ^~~~~~~~~~~~~~~~~~~~~~~~~
             |            _InterlockedExchangeAdd
       atomic_windows.h: In function '__msvc_xadd_i32':
       atomic_windows.h:287:36: warning: passing argument 1 of '_InterlockedExchangeAdd' from incompatible pointer type [-Wincompatible-pointer-types]
         287 |     return _InterlockedExchangeAdd(addr, val);
             |                                    ^~~~
             |                                    |
             |                                    volatile int *
       D:/yy/w64devkit/x86_64-w64-mingw32/include/psdk_inc/intrin-impl.h:1648:53: note: expected 'volatile long int *' but argument is of type 'volatile int *'
        1648 | __LONG32 _InterlockedExchangeAdd(__LONG32 volatile *Addend, __LONG32 Value) {
             |                                                     ^
       In function 'ggml_op_name',
           inlined from 'ggml_get_n_tasks' at ggml.c:16954:17:
       ggml.c:2004:24: warning: array subscript 70 is above array bounds of 'const char *[69]' [-Warray-bounds=]
        2004 |     return GGML_OP_NAME[op];
             |            ~~~~~~~~~~~~^~~~
       ggml.c: In function 'ggml_get_n_tasks':
       ggml.c:1586:21: note: while referencing 'GGML_OP_NAME'
        1586 | static const char * GGML_OP_NAME[GGML_OP_COUNT] = {
             |                     ^~~~~~~~~~~~
       In function 'ggml_compute_forward_add_f32',
           inlined from 'ggml_compute_forward_add' at ggml.c:7262:17:
       ggml.c:6995:40: warning: 'ft' may be used uninitialized [-Wmaybe-uninitialized]
        6995 |                         dst_ptr[i] = ft[i] >= 0.0f ? src0_ptr[i] + src1_ptr[i] : 0;
             |                                        ^
       ggml.c: In function 'ggml_compute_forward_add':
       ggml.c:6960:12: note: 'ft' was declared here
        6960 |     float *ft;
             |            ^~
       cc1.exe: some warnings being treated as errors
       make: *** [Makefile:533: ggml.o] Error 1
      

@aoguai
Copy link

aoguai commented Jan 19, 2024

I believe I have found a solution to the issue:

You can refer to the following Stack Overflow post for more details on the CUDA compilation issue on Windows with CMake error "No CUDA Toolset":
c++ - CUDA compile problems on Windows, CMake error: no CUDA toolset found - Stack Overflow

This problem usually occurs because the Visual Studio Integration is missing when installing CUDA. Here's what I did:

  1. Navigate to the installation directory of your CUDA, for example:
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\extras\visual_studio_integration\MSBuildExtensions

  2. Find these four files:

    • CUDA 11.7.props
    • CUDA 11.7.targets
    • CUDA 11.7.xml
    • Nvda.Build.CudaTasks.v11.7.dll
  3. Copy and replace them in the corresponding paths under Visual Studio:

    • C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations
    • C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Microsoft\VC\v170\BuildCustomizations

Make sure to adjust the paths to your CUDA installation and Visual Studio directories, and remember to create backups.

After these steps, the issue should be resolved.


Date and Time: 2024-01-19 21:41 (Edited)

Environment: Windows

Hardware Configuration:

  • GPU: RTX 3050 Ti 8GB
  • Selected Model: ReluLLaMA-7B-PowerInfer-GGUF

Run Output:

llm_load_gpu_split: offloaded 0.00 MiB of FFN weights to GPU
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  256.00 MB
llama_build_graph: non-view tensors processed: 548/836
llama_build_graph: ****************************************************************
llama_build_graph: not all non-view tensors have been processed with a callback
llama_build_graph: this can indicate an inefficiency in the graph implementation
llama_build_graph: build with LLAMA_OFFLOAD_DEBUG for more info
llama_build_graph: ref: https://github.com/ggerganov/llama.cpp/pull/3837
llama_build_graph: ****************************************************************
llama_new_context_with_model: compute buffer total size = 6.91 MB
llama_new_context_with_model: VRAM scratch buffer: 5.34 MB
llama_new_context_with_model: total VRAM used: 3269.75 MB (model: 3264.41 MB, context: 5.34 MB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 32, n_predict = 128, n_keep = 0


Once upon a time there lived three brothers: Hodja, Sinan and Ali. It is told that these three men were very wise and clever, but the only one who was wiser than them all was their father.
Their father was so wise that he could tell what people would do before they did it. This knowledge made him famous all over the world. People came to him from every corner of the earth asking for his advice and guidance. Every day, when these three brothers went to school, they were always very hungry because they had nothing to eat at home.
One night, their father gave each boy a walnut.
llama_print_timings:        load time =   14126.55 ms
llama_print_timings:      sample time =      35.82 ms /   128 runs   (    0.28 ms per token,  3573.42 tokens per second)
llama_print_timings: prompt eval time =   10247.01 ms /     5 tokens ( 2049.40 ms per token,     0.49 tokens per second)
llama_print_timings:        eval time =   88055.01 ms /   127 runs   (  693.35 ms per token,     1.44 tokens per second)
llama_print_timings:       total time =  100799.35 ms
Log end

It's great for me

@hodlen
Copy link
Collaborator

hodlen commented Jan 22, 2024

Thanks @aoguai for your informative reply!

We also encountered this issue in dev and managed to fix it by removing all whitespace for every CUDA environment variable. Like replacing C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\... with C:\ProgramFiles\NVIDIAGPUComputing Toolkit\CUDA\.... If your CUDA toolkit is properly installed and still struggle with this issue, please give it a try!

@c469591
Copy link
Author

c469591 commented Jan 28, 2024

Thank you everyone, my issue has been successfully resolved.
Can this project be modified to use an interactive dialogue chat mode for inference?
Although I can infer smoothly at the moment, each time I need to re-enter a complete inference command, and the output of the inference seems incomplete and even includes some other evaluation outputs.
Are there any other projects that have already applied this one to create a chat tool that general users can directly use?
Can this project be made to continue running instead of exiting immediately after the inference is complete?
Thanks!

@hodlen
Copy link
Collaborator

hodlen commented Jan 29, 2024

There are various ways to chat with these models interactively, and the simplest one is to start a server (see examples/server). It provides a simple web UI to chat with and matches you demand. Please kindly refer to #126.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants