-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在A100-80G上无法找到cuda的情况 #182
Comments
抱歉,我提供的日志看起来很凌乱,下面可能会清楚一些: (base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ ./build/bin/main -m /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --ignore-eos Log start (base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ cmake -S . -B build -DLLAMA_CUBLAS=ON -- The C compiler identification is GNU 11.4.0 |
CMake的输出代表你的环境中有CUDA编译工具链,但运行时报错“CUDA is not loaded”,这种情况可能是驱动没有正常加载。可以尝试一下在同一个环境下运行 如果你的显卡硬件正常,驱动安装正确,这种临时问题通常可以通过重启机器或容器解决。 |
你好,我在A100-80G机器上复现powerinfer,但是遇到了以下的错误,看起来貌似是没有检测出机器上的i显卡?
机器的cuda版本:12.4
(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ ./build/bin/main -m /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --ignore-eos Log start main: build = 1578 (906830b) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: seed = 1713950721 llama_model_loader: loaded meta data with 23 key-value pairs and 883 tensors from /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf (version GGUF V3 (latest)) llama_model_loader: - tensor 0: token_embd.weight q4_0 [ 8192, 32000, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_0 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_0 [ 8192, 1024, 1, 1 ] ... llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str ... llama_model_loader: - type f32: 161 tensors llama_model_loader: - type q4_0: 722 tensors llama_model_load: PowerInfer model loaded. Sparse inference will be used. llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 74.98 B llm_load_print_meta: model size = 39.28 GiB (4.50 BPW) llm_load_print_meta: general.name = nvme llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: sparse_pred_threshold = 0.00 error loading model: CUDA is not loaded llama_load_model_from_file_with_context: failed to load model llama_init_from_gpt_params: error: failed to load model '/home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf' main: error: unable to load model
我在编译阶段的输出是:
(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ cmake -S . -B build -DLLAMA_CUBLAS=ON -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.99") -- cuBLAS found -- The CUDA compiler identification is NVIDIA 11.5.119 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Using CUDA architectures: 52;61;70 GNU ld (GNU Binutils for Ubuntu) 2.38 -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/turbo/projects/PowerInfer/build
和
``
The text was updated successfully, but these errors were encountered: