Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在A100-80G上无法找到cuda的情况 #182

Open
bulaikexiansheng opened this issue Apr 24, 2024 · 2 comments
Open

在A100-80G上无法找到cuda的情况 #182

bulaikexiansheng opened this issue Apr 24, 2024 · 2 comments
Labels
question Further information is requested

Comments

@bulaikexiansheng
Copy link

你好,我在A100-80G机器上复现powerinfer,但是遇到了以下的错误,看起来貌似是没有检测出机器上的i显卡?

机器的cuda版本:12.4

(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ ./build/bin/main -m /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --ignore-eos Log start main: build = 1578 (906830b) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: seed = 1713950721 llama_model_loader: loaded meta data with 23 key-value pairs and 883 tensors from /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf (version GGUF V3 (latest)) llama_model_loader: - tensor 0: token_embd.weight q4_0 [ 8192, 32000, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_0 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_0 [ 8192, 1024, 1, 1 ] ... llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str ... llama_model_loader: - type f32: 161 tensors llama_model_loader: - type q4_0: 722 tensors llama_model_load: PowerInfer model loaded. Sparse inference will be used. llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 74.98 B llm_load_print_meta: model size = 39.28 GiB (4.50 BPW) llm_load_print_meta: general.name = nvme llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: sparse_pred_threshold = 0.00 error loading model: CUDA is not loaded llama_load_model_from_file_with_context: failed to load model llama_init_from_gpt_params: error: failed to load model '/home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf' main: error: unable to load model

我在编译阶段的输出是:
(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ cmake -S . -B build -DLLAMA_CUBLAS=ON -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.99") -- cuBLAS found -- The CUDA compiler identification is NVIDIA 11.5.119 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Using CUDA architectures: 52;61;70 GNU ld (GNU Binutils for Ubuntu) 2.38 -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/turbo/projects/PowerInfer/build

``

@bulaikexiansheng bulaikexiansheng added the bug-unconfirmed Unconfirmed bugs label Apr 24, 2024
@bulaikexiansheng
Copy link
Author

抱歉,我提供的日志看起来很凌乱,下面可能会清楚一些:

(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ ./build/bin/main -m /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --ignore-eos

Log start
main: build = 1578 (906830b)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1713951199
llama_model_loader: loaded meta data with 23 key-value pairs and 883 tensors from /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight q4_0 [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_v.weight q4_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_output.weight q4_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q4_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.ffn_up.weight q4_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_down_t.weight q4_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
......
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: llama.rope.freq_base f32
llama_model_loader: - kv 11: general.file_type u32
llama_model_loader: - kv 12: tokenizer.ggml.model str
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr
llama_model_loader: - kv 14: tokenizer.ggml.scores arr
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool
llama_model_loader: - kv 22: general.quantization_version u32
llama_model_loader: - type f32: 161 tensors
llama_model_loader: - type q4_0: 722 tensors
llama_model_load: PowerInfer model loaded. Sparse inference will be used.
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = mostly Q4_0
llm_load_print_meta: model params = 74.98 B
llm_load_print_meta: model size = 39.28 GiB (4.50 BPW)
llm_load_print_meta: general.name = nvme
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: sparse_pred_threshold = 0.00
error loading model: CUDA is not loaded
llama_load_model_from_file_with_context: failed to load model
llama_init_from_gpt_params: error: failed to load model '/home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf'
main: error: unable to load model

(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ cmake -S . -B build -DLLAMA_CUBLAS=ON

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.99")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 11.5.119
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
GNU ld (GNU Binutils for Ubuntu) 2.38
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/turbo/projects/PowerInfer/build

@hodlen
Copy link
Collaborator

hodlen commented Apr 25, 2024

CMake的输出代表你的环境中有CUDA编译工具链,但运行时报错“CUDA is not loaded”,这种情况可能是驱动没有正常加载。可以尝试一下在同一个环境下运行 nvidia-smi 查看是否能够检测到GPU,如果同样会报错就可以确定是驱动的问题。

如果你的显卡硬件正常,驱动安装正确,这种临时问题通常可以通过重启机器或容器解决。

@hodlen hodlen added question Further information is requested and removed bug-unconfirmed Unconfirmed bugs labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants