Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash loading llama-3-chinese-8b-instruct model #4080

Closed
jiangweiatgithub opened this issue May 1, 2024 · 9 comments
Closed

crash loading llama-3-chinese-8b-instruct model #4080

jiangweiatgithub opened this issue May 1, 2024 · 9 comments
Labels
bug Something isn't working model request Model requests

Comments

@jiangweiatgithub
Copy link

What is the issue?

When trying run a model created from a GGUF model, the captioned error happens. The model can be downloade from: https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct/summary

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.132

@jiangweiatgithub jiangweiatgithub added the bug Something isn't working label May 1, 2024
@jiangweiatgithub
Copy link
Author

Here is part of the server log that is directly related to this issue:
[GIN] 2024/05/01 - 20:44:53 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/05/01 - 20:44:53 | 200 | 565.5µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/05/01 - 20:44:53 | 200 | 211.1µs | 127.0.0.1 | POST "/api/show"
time=2024-05-01T20:44:56.483+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-01T20:44:56.483+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-05-01T20:44:56.512+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
time=2024-05-01T20:44:56.513+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-01T20:44:56.513+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-01T20:44:56.622+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
time=2024-05-01T20:44:56.649+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-01T20:44:56.650+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_
.dll"
time=2024-05-01T20:44:56.679+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
time=2024-05-01T20:44:56.680+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-01T20:44:56.680+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-01T20:44:56.785+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
time=2024-05-01T20:44:56.815+08:00 level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="5033.0 MiB" used="5033.0 MiB" available="15279.0 MiB" kv="256.0 MiB" fulloffload="164.0 MiB" partialoffload="677.5 MiB"
time=2024-05-01T20:44:56.815+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-01T20:44:56.824+08:00 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\polyt\AppData\Local\Temp\ollama1755869092\runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\polyt\.ollama\models\blobs\sha256-57a2dd8b97f3231f002b0013b027fd06102c82847bceb6cd6629cfb1e012cd82 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --port 13501"
time=2024-05-01T20:44:56.849+08:00 level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"37608","timestamp":1714567497}
{"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"37608","timestamp":1714567497}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"37608","timestamp":1714567497,"total_threads":16}
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\polyt.ollama\models\blobs\sha256-57a2dd8b97f3231f002b0013b027fd06102c82847bceb6cd6629cfb1e012cd82 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = llama-3-chinese
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = llama-3-chinese
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: Quadro RTX 5000, compute capability 7.5, VMM: yes
llm_load_tensors: ggml ctx size = 0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 281.81 MiB
llm_load_tensors: CUDA0 buffer size = 4155.99 MiB
.time=2024-05-01T20:44:58.904+08:00 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "
[GIN] 2024/05/01 - 20:44:58 | 500 | 5.1448234s | 127.0.0.1 | POST "/api/chat"

@dhiltgen dhiltgen self-assigned this May 1, 2024
@dhiltgen dhiltgen added nvidia Issues relating to Nvidia GPUs and CUDA windows labels May 1, 2024
@dhiltgen
Copy link
Collaborator

dhiltgen commented May 1, 2024

Can you try running the server with $env:OLLAMA_DEBUG="1" to see if we can get a little more details around the time of the crash? It may also be helpful to try to force it to run on CPU and see if that changes behavior. ($env:OLLAMA_LLM_LIBRARY="cpu_avx2")

@dhiltgen dhiltgen added the model request Model requests label May 1, 2024
@jiangweiatgithub
Copy link
Author

To be frank, I'm yet to find out how to use those two env variables!
FYI, when I run the same in a CPU only machine, I get the same error!

@jiangweiatgithub
Copy link
Author

Here is the serve log from the cpu-only machine:
[GIN] 2024/05/02 - 13:31:01 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/05/02 - 13:31:01 | 200 | 15.0003ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2024/05/02 - 13:31:23 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/05/02 - 13:31:23 | 200 | 998.4µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/05/02 - 13:31:23 | 200 | 1.0009ms | 127.0.0.1 | POST "/api/show"
time=2024-05-02T13:31:26.077+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T13:31:26.077+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-05-02T13:31:26.097+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\jiangw\AppData\Local\Programs\Ollama\cudart64_110.dll]"
time=2024-05-02T13:31:26.098+08:00 level=INFO source=gpu.go:343 msg="Unable to load cudart CUDA management library C:\Users\jiangw\AppData\Local\Programs\Ollama\cudart64_110.dll: your nvidia driver is too old or missing, please upgrade to run ollama"
time=2024-05-02T13:31:26.098+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll"
time=2024-05-02T13:31:26.116+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []"
time=2024-05-02T13:31:26.116+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-02T13:31:26.117+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T13:31:26.117+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_
.dll"
time=2024-05-02T13:31:26.136+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\jiangw\AppData\Local\Programs\Ollama\cudart64_110.dll]"
time=2024-05-02T13:31:26.137+08:00 level=INFO source=gpu.go:343 msg="Unable to load cudart CUDA management library C:\Users\jiangw\AppData\Local\Programs\Ollama\cudart64_110.dll: your nvidia driver is too old or missing, please upgrade to run ollama"
time=2024-05-02T13:31:26.137+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll"
time=2024-05-02T13:31:26.155+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []"
time=2024-05-02T13:31:26.155+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-02T13:31:26.157+08:00 level=INFO source=server.go:127 msg="offload to gpu" reallayers=0 layers=0 required="8025.3 MiB" used="677.5 MiB" available="0 B" kv="256.0 MiB" fulloffload="164.0 MiB" partialoffload="677.5 MiB"
time=2024-05-02T13:31:26.163+08:00 level=INFO source=server.go:264 msg="starting llama server" cmd="F:\TEMP2\2\ollama515740053\runners\cpu_avx2\ollama_llama_server.exe --model C:\Users\jiangw\.ollama\models\blobs\sha256-a6dbf5ed89ddec7bb5e74591d9e18e07e8fc3b4916b09451bff75c955cc2ed75 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 0 --port 50921"
time=2024-05-02T13:31:26.167+08:00 level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"32680","timestamp":1714627886}
{"function":"server_params_parse","level":"WARN","line":2382,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"32680","timestamp":1714627886}
{"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"32680","timestamp":1714627886}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":12,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"32680","timestamp":1714627886,"total_threads":24}
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\jiangw.ollama\models\blobs\sha256-a6dbf5ed89ddec7bb5e74591d9e18e07e8fc3b4916b09451bff75c955cc2ed75 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = llama-3-chinese
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 7
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 7.95 GiB (8.50 BPW)
llm_load_print_meta: general.name = llama-3-chinese
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: CPU buffer size = 8137.64 MiB
.........................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
time=2024-05-02T13:31:28.628+08:00 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "
[GIN] 2024/05/02 - 13:31:28 | 500 | 5.3040015s | 127.0.0.1 | POST "/api/chat"

@jiangweiatgithub
Copy link
Author

jiangweiatgithub commented May 2, 2024

Now I guess I know how to re-start the server. Here is what I got from the server console after I set the first env. var., re-starte the server and re-run the model:
(base) PS C:\Users\polyt> $env:OLLAMA_DEBUG="1"
(base) PS C:\Users\polyt> ollama serve
time=2024-05-02T14:02:17.241+08:00 level=INFO source=images.go:817 msg="total blobs: 81"
time=2024-05-02T14:02:17.246+08:00 level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-05-02T14:02:17.253+08:00 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)"
time=2024-05-02T14:02:17.256+08:00 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu file=build/windows/amd64/cpu/bin/llama.dll.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu file=build/windows/amd64/cpu/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx file=build/windows/amd64/cpu_avx/bin/llama.dll.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx file=build/windows/amd64/cpu_avx/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx2 file=build/windows/amd64/cpu_avx2/bin/llama.dll.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx2 file=build/windows/amd64/cpu_avx2/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cuda_v11.3 file=build/windows/amd64/cuda_v11.3/bin/llama.dll.gz
time=2024-05-02T14:02:17.257+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cuda_v11.3 file=build/windows/amd64/cuda_v11.3/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:02:17.258+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=rocm_v5.7 file=build/windows/amd64/rocm_v5.7/bin/llama.dll.gz
time=2024-05-02T14:02:17.258+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=rocm_v5.7 file=build/windows/amd64/rocm_v5.7/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx2
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cuda_v11.3
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\rocm_v5.7
time=2024-05-02T14:02:17.563+08:00 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7]"
time=2024-05-02T14:02:17.563+08:00 level=DEBUG source=payload.go:42 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
[GIN] 2024/05/02 - 14:02:27 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/05/02 - 14:02:27 | 200 | 576µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/05/02 - 14:02:27 | 200 | 584.2µs | 127.0.0.1 | POST "/api/show"
time=2024-05-02T14:02:27.341+08:00 level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(llm.containerGGUF)(0xc000698380), kv:llm.KV{}, tensors:[]llm.Tensor(nil), parameters:0x0}"
time=2024-05-02T14:02:30.051+08:00 level=DEBUG source=gguf.go:193 msg="general.architecture = llama"
time=2024-05-02T14:02:30.058+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T14:02:30.059+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_
.dll"
time=2024-05-02T14:02:30.059+08:00 level=DEBUG source=gpu.go:286 msg="gpu management search paths: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_
.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v*\bin\cudart64_.dll D:\ProgramData\Anaconda3\cudart64_.dll* D:\ProgramData\Anaconda3\Library\mingw-w64\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\usr\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll D:\ProgramData\Anaconda3\Scripts\cudart64_.dll D:\ProgramData\Anaconda3\bin\cudart64_.dll D:\ProgramData\Anaconda3\condabin\cudart64_.dll C:\Program Files\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\Microsoft MPI\Bin\cudart64_.dll C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp\cudart64_.dll C:\Program Files\Python38\Scripts\cudart64_.dll C:\Program Files\Python38\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll D:\Program Files (x86)\SDL International\SDLX\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\Program Files\PuTTY\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll C:\Program Files (x86)\NetSarang\Xshell 7\cudart64_.dll C:\Program Files (x86)\NetSarang\Xftp 7\cudart64_.dll C:\Program Files (x86)\PDFtk\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\lib\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\Program Files\Calibre2\cudart64_.dll C:\Program Files\Docker\Docker\resources\bin\cudart64_.dll C:\Program Files\Pandoc\cudart64_.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1\cudart64_.dll C:\Users\polyt\.cargo\bin\cudart64_.dll C:\Program Files (x86)\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll X:\Program Files (x86)\BaseX\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System3\cudart64_.dll C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\polyt\AppData\Local\GitHubDesktop\bin\cudart64_.dll]"
time=2024-05-02T14:02:30.089+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
wiring cudart library functions in C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll
dlsym: cudaSetDevice
dlsym: cudaDeviceSynchronize
dlsym: cudaDeviceReset
dlsym: cudaMemGetInfo
dlsym: cudaGetDeviceCount
dlsym: cudaDeviceGetAttribute
dlsym: cudaDriverGetVersion
CUDA driver version: 12-2
time=2024-05-02T14:02:30.126+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-02T14:02:30.126+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[0] CUDA totalMem 4294639616
[0] CUDA freeMem 3136290816
time=2024-05-02T14:02:30.242+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
releasing cudart library
time=2024-05-02T14:02:30.272+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T14:02:30.272+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-05-02T14:02:30.272+08:00 level=DEBUG source=gpu.go:286 msg="gpu management search paths: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_
.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v*\bin\cudart64_.dll D:\ProgramData\Anaconda3\cudart64_.dll* D:\ProgramData\Anaconda3\Library\mingw-w64\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\usr\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll D:\ProgramData\Anaconda3\Scripts\cudart64_.dll D:\ProgramData\Anaconda3\bin\cudart64_.dll D:\ProgramData\Anaconda3\condabin\cudart64_.dll C:\Program Files\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\Microsoft MPI\Bin\cudart64_.dll C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp\cudart64_.dll C:\Program Files\Python38\Scripts\cudart64_.dll C:\Program Files\Python38\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll D:\Program Files (x86)\SDL International\SDLX\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\Program Files\PuTTY\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll C:\Program Files (x86)\NetSarang\Xshell 7\cudart64_.dll C:\Program Files (x86)\NetSarang\Xftp 7\cudart64_.dll C:\Program Files (x86)\PDFtk\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\lib\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\Program Files\Calibre2\cudart64_.dll C:\Program Files\Docker\Docker\resources\bin\cudart64_.dll C:\Program Files\Pandoc\cudart64_.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1\cudart64_.dll C:\Users\polyt\.cargo\bin\cudart64_.dll C:\Program Files (x86)\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll X:\Program Files (x86)\BaseX\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System3\cudart64_.dll C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\polyt\AppData\Local\GitHubDesktop\bin\cudart64_.dll]"
time=2024-05-02T14:02:30.306+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
wiring cudart library functions in C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll
dlsym: cudaSetDevice
dlsym: cudaDeviceSynchronize
dlsym: cudaDeviceReset
dlsym: cudaMemGetInfo
dlsym: cudaGetDeviceCount
dlsym: cudaDeviceGetAttribute
dlsym: cudaDriverGetVersion
CUDA driver version: 12-2
time=2024-05-02T14:02:30.308+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-02T14:02:30.308+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[0] CUDA totalMem 4294639616
[0] CUDA freeMem 3136290816
time=2024-05-02T14:02:30.392+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
releasing cudart library
time=2024-05-02T14:02:30.420+08:00 level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="8482.3 MiB" used="8482.3 MiB" available="15279.0 MiB" kv="256.0 MiB" fulloffload="164.0 MiB" partialoffload="677.5 MiB"
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx2
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cuda_v11.3
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\rocm_v5.7
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu
time=2024-05-02T14:02:30.421+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx
time=2024-05-02T14:02:30.422+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cpu_avx2
time=2024-05-02T14:02:30.422+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cuda_v11.3
time=2024-05-02T14:02:30.422+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\rocm_v5.7
time=2024-05-02T14:02:30.422+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-02T14:02:30.432+08:00 level=DEBUG source=server.go:259 msg="PATH=D:\ProgramData\Anaconda3;D:\ProgramData\Anaconda3\Library\mingw-w64\bin;D:\ProgramData\Anaconda3\Library\usr\bin;D:\ProgramData\Anaconda3\Library\bin;D:\ProgramData\Anaconda3\Scripts;D:\ProgramData\Anaconda3\bin;D:\ProgramData\Anaconda3\condabin;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\Microsoft MPI\Bin;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp;C:\Program Files\Python38\Scripts;C:\Program Files\Python38;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin;C:\Program Files\Git\cmd;C:\ProgramData\chocolatey\bin;C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin;C:\Program Files\Git\usr\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;D:\Program Files (x86)\SDL International\SDLX;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;C:\Program Files\dotnet;C:\Program Files\PuTTY;C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn;C:\Program Files\Microsoft SQL Server\150\Tools\Binn;C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn;C:\Program Files\Microsoft SQL Server\150\DTS\Binn;D:\ProgramData\Anaconda3\Library\bin;C:\Program Files (x86)\NetSarang\Xshell 7;C:\Program Files (x86)\NetSarang\Xftp 7;C:\Program Files (x86)\PDFtk\bin;D:\mindopt\0.15.1\win64-x86\bin;D:\mindopt\0.15.1\win64-x86\lib;C:\Program Files\nodejs;C:\Program Files\Calibre2;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\Pandoc;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1;C:\Users\polyt\.cargo\bin;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin;C:\Program Files\Git\cmd;C:\Program Files\nodejs;C:\ProgramData\chocolatey\bin;X:\Program Files (x86)\BaseX\bin;C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin;C:\Program Files\Git\usr\bin;C:\Program Files\dotnet;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System3;C:\Users\polyt\AppData\Local\Programs\Ollama;C:\Users\polyt\AppData\Local\GitHubDesktop\bin;C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cuda_v11.3"
time=2024-05-02T14:02:30.432+08:00 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\polyt\AppData\Local\Temp\ollama2406829336\runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\polyt\.ollama\models\blobs\sha256-09a73f0b7eb0ea0cc700f6e0eb9e06f23d25600232f09c13d77e402cc8f9667e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 11348"
time=2024-05-02T14:02:30.583+08:00 level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
time=2024-05-02T14:02:30.636+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:11348/health\": dial tcp 127.0.0.1:11348: connectex: No connection could be made because the target machine actively refused it."
{"function":"server_params_parse","level":"WARN","line":2494,"msg":"server.cpp is not built with verbose logging.","tid":"22908","timestamp":1714629751}
{"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"22908","timestamp":1714629751}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"22908","timestamp":1714629751,"total_threads":16}
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from C:\Users\polyt.ollama\models\blobs\sha256-09a73f0b7eb0ea0cc700f6e0eb9e06f23d25600232f09c13d77e402cc8f9667e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = llama-3-chinese-instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 7
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128001
llama_model_loader: - kv 21: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
time=2024-05-02T14:02:31.386+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-05-02T14:02:32.198+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:11348/health": dial tcp 127.0.0.1:11348: connectex: No connection could be made because the target machine actively refused it."
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 7.95 GiB (8.50 BPW)
llm_load_print_meta: general.name = llama-3-chinese-instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: PAD token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: Quadro RTX 5000, compute capability 7.5, VMM: yes
llm_load_tensors: ggml ctx size = 0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 532.31 MiB
llm_load_tensors: CUDA0 buffer size = 7605.33 MiB
.............................................time=2024-05-02T14:02:41.896+08:00 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "
time=2024-05-02T14:02:41.896+08:00 level=DEBUG source=server.go:832 msg="stopping llama server"
[GIN] 2024/05/02 - 14:02:41 | 500 | 14.5559664s | 127.0.0.1 | POST "/api/chat"

@jiangweiatgithub
Copy link
Author

And here is what I got from the server console after I used the 2nd env. var., the one for the CPU:
(base) PS C:\Users\polyt> $env:OLLAMA_LLM_LIBRARY="cpu_avx2"
(base) PS C:\Users\polyt> ollama serve
time=2024-05-02T14:05:04.289+08:00 level=INFO source=images.go:817 msg="total blobs: 81"
time=2024-05-02T14:05:04.298+08:00 level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-05-02T14:05:04.302+08:00 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)"
time=2024-05-02T14:05:04.304+08:00 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners
time=2024-05-02T14:05:04.304+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu file=build/windows/amd64/cpu/bin/llama.dll.gz
time=2024-05-02T14:05:04.304+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu file=build/windows/amd64/cpu/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:05:04.304+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx file=build/windows/amd64/cpu_avx/bin/llama.dll.gz
time=2024-05-02T14:05:04.305+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx file=build/windows/amd64/cpu_avx/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:05:04.305+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx2 file=build/windows/amd64/cpu_avx2/bin/llama.dll.gz
time=2024-05-02T14:05:04.305+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cpu_avx2 file=build/windows/amd64/cpu_avx2/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:05:04.305+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cuda_v11.3 file=build/windows/amd64/cuda_v11.3/bin/llama.dll.gz
time=2024-05-02T14:05:04.305+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=cuda_v11.3 file=build/windows/amd64/cuda_v11.3/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:05:04.306+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=rocm_v5.7 file=build/windows/amd64/rocm_v5.7/bin/llama.dll.gz
time=2024-05-02T14:05:04.307+08:00 level=DEBUG source=payload.go:160 msg=extracting variant=rocm_v5.7 file=build/windows/amd64/rocm_v5.7/bin/ollama_llama_server.exe.gz
time=2024-05-02T14:05:04.615+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu
time=2024-05-02T14:05:04.615+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx
time=2024-05-02T14:05:04.616+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2
time=2024-05-02T14:05:04.616+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cuda_v11.3
time=2024-05-02T14:05:04.616+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\rocm_v5.7
time=2024-05-02T14:05:04.616+08:00 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7]"
time=2024-05-02T14:05:04.616+08:00 level=DEBUG source=payload.go:42 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
[GIN] 2024/05/02 - 14:05:13 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/05/02 - 14:05:13 | 200 | 603.5µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/05/02 - 14:05:13 | 200 | 1.7258ms | 127.0.0.1 | POST "/api/show"
time=2024-05-02T14:05:13.583+08:00 level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(llm.containerGGUF)(0xc00013e440), kv:llm.KV{}, tensors:[]llm.Tensor(nil), parameters:0x0}"
time=2024-05-02T14:05:16.344+08:00 level=DEBUG source=gguf.go:193 msg="general.architecture = llama"
time=2024-05-02T14:05:16.349+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T14:05:16.349+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_
.dll"
time=2024-05-02T14:05:16.349+08:00 level=DEBUG source=gpu.go:286 msg="gpu management search paths: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_
.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v*\bin\cudart64_.dll D:\ProgramData\Anaconda3\cudart64_.dll* D:\ProgramData\Anaconda3\Library\mingw-w64\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\usr\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll D:\ProgramData\Anaconda3\Scripts\cudart64_.dll D:\ProgramData\Anaconda3\bin\cudart64_.dll D:\ProgramData\Anaconda3\condabin\cudart64_.dll C:\Program Files\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\Microsoft MPI\Bin\cudart64_.dll C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp\cudart64_.dll C:\Program Files\Python38\Scripts\cudart64_.dll C:\Program Files\Python38\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll D:\Program Files (x86)\SDL International\SDLX\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\Program Files\PuTTY\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll C:\Program Files (x86)\NetSarang\Xshell 7\cudart64_.dll C:\Program Files (x86)\NetSarang\Xftp 7\cudart64_.dll C:\Program Files (x86)\PDFtk\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\lib\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\Program Files\Calibre2\cudart64_.dll C:\Program Files\Docker\Docker\resources\bin\cudart64_.dll C:\Program Files\Pandoc\cudart64_.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1\cudart64_.dll C:\Users\polyt\.cargo\bin\cudart64_.dll C:\Program Files (x86)\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll X:\Program Files (x86)\BaseX\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System3\cudart64_.dll C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\polyt\AppData\Local\GitHubDesktop\bin\cudart64_.dll]"
time=2024-05-02T14:05:16.384+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
wiring cudart library functions in C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll
dlsym: cudaSetDevice
dlsym: cudaDeviceSynchronize
dlsym: cudaDeviceReset
dlsym: cudaMemGetInfo
dlsym: cudaGetDeviceCount
dlsym: cudaDeviceGetAttribute
dlsym: cudaDriverGetVersion
CUDA driver version: 12-2
time=2024-05-02T14:05:16.426+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-02T14:05:16.426+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[0] CUDA totalMem 4294639616
[0] CUDA freeMem 3136290816
time=2024-05-02T14:05:16.539+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
releasing cudart library
time=2024-05-02T14:05:16.569+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-05-02T14:05:16.570+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-05-02T14:05:16.570+08:00 level=DEBUG source=gpu.go:286 msg="gpu management search paths: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_
.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v*\bin\cudart64_.dll D:\ProgramData\Anaconda3\cudart64_.dll* D:\ProgramData\Anaconda3\Library\mingw-w64\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\usr\bin\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll D:\ProgramData\Anaconda3\Scripts\cudart64_.dll D:\ProgramData\Anaconda3\bin\cudart64_.dll D:\ProgramData\Anaconda3\condabin\cudart64_.dll C:\Program Files\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\Microsoft MPI\Bin\cudart64_.dll C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp\cudart64_.dll C:\Program Files\Python38\Scripts\cudart64_.dll C:\Program Files\Python38\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll D:\Program Files (x86)\SDL International\SDLX\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\WINDOWS\System32\OpenSSH\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\Program Files\PuTTY\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\Tools\Binn\cudart64_.dll C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\150\DTS\Binn\cudart64_.dll D:\ProgramData\Anaconda3\Library\bin\cudart64_.dll C:\Program Files (x86)\NetSarang\Xshell 7\cudart64_.dll C:\Program Files (x86)\NetSarang\Xftp 7\cudart64_.dll C:\Program Files (x86)\PDFtk\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\bin\cudart64_.dll D:\mindopt\0.15.1\win64-x86\lib\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\Program Files\Calibre2\cudart64_.dll C:\Program Files\Docker\Docker\resources\bin\cudart64_.dll C:\Program Files\Pandoc\cudart64_.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1\cudart64_.dll C:\Users\polyt\.cargo\bin\cudart64_.dll C:\Program Files (x86)\Common Files\Oracle\Java\javapath\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp\cudart64_.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_.dll C:\Program Files\Microsoft SQL Server\130\Tools\Binn\cudart64_.dll C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\cudart64_.dll C:\Strawberry\c\bin\cudart64_.dll C:\Strawberry\perl\site\bin\cudart64_.dll C:\Strawberry\perl\bin\cudart64_.dll C:\Program Files\Git\cmd\cudart64_.dll C:\Program Files\nodejs\cudart64_.dll C:\ProgramData\chocolatey\bin\cudart64_.dll X:\Program Files (x86)\BaseX\bin\cudart64_.dll C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin\cudart64_.dll C:\Program Files\Git\usr\bin\cudart64_.dll C:\Program Files\dotnet\cudart64_.dll C:\WINDOWS\system32\cudart64_.dll C:\WINDOWS\cudart64_.dll C:\WINDOWS\System32\Wbem\cudart64_.dll C:\WINDOWS\System3\cudart64_.dll C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\polyt\AppData\Local\GitHubDesktop\bin\cudart64_.dll]"
time=2024-05-02T14:05:16.599+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudart64_110.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\cudart64_12.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\cudart64_110.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart64_101.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll]"
wiring cudart library functions in C:\Users\polyt\AppData\Local\Programs\Ollama\cudart64_110.dll
dlsym: cudaSetDevice
dlsym: cudaDeviceSynchronize
dlsym: cudaDeviceReset
dlsym: cudaMemGetInfo
dlsym: cudaGetDeviceCount
dlsym: cudaDeviceGetAttribute
dlsym: cudaDriverGetVersion
CUDA driver version: 12-2
time=2024-05-02T14:05:16.605+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-05-02T14:05:16.605+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[0] CUDA totalMem 4294639616
[0] CUDA freeMem 3136290816
time=2024-05-02T14:05:16.689+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 7.5"
releasing cudart library
time=2024-05-02T14:05:16.724+08:00 level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="8482.3 MiB" used="8482.3 MiB" available="15279.0 MiB" kv="256.0 MiB" fulloffload="164.0 MiB" partialoffload="677.5 MiB"
time=2024-05-02T14:05:16.724+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu
time=2024-05-02T14:05:16.724+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cuda_v11.3
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\rocm_v5.7
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx
time=2024-05-02T14:05:16.725+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2
time=2024-05-02T14:05:16.726+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cuda_v11.3
time=2024-05-02T14:05:16.726+08:00 level=DEBUG source=payload.go:68 msg="availableServers : found" file=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\rocm_v5.7
time=2024-05-02T14:05:16.726+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-02T14:05:16.726+08:00 level=INFO source=server.go:152 msg="user override" OLLAMA_LLM_LIBRARY=cpu_avx2 path=C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2
time=2024-05-02T14:05:16.736+08:00 level=DEBUG source=server.go:259 msg="PATH=D:\ProgramData\Anaconda3;D:\ProgramData\Anaconda3\Library\mingw-w64\bin;D:\ProgramData\Anaconda3\Library\usr\bin;D:\ProgramData\Anaconda3\Library\bin;D:\ProgramData\Anaconda3\Scripts;D:\ProgramData\Anaconda3\bin;D:\ProgramData\Anaconda3\condabin;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\Microsoft MPI\Bin;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp;C:\Program Files\Python38\Scripts;C:\Program Files\Python38;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin;C:\Program Files\Git\cmd;C:\ProgramData\chocolatey\bin;C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin;C:\Program Files\Git\usr\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;D:\Program Files (x86)\SDL International\SDLX;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;C:\Program Files\dotnet;C:\Program Files\PuTTY;C:\Program Files (x86)\Microsoft SQL Server\150\Tools\Binn;C:\Program Files\Microsoft SQL Server\150\Tools\Binn;C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn;C:\Program Files\Microsoft SQL Server\150\DTS\Binn;D:\ProgramData\Anaconda3\Library\bin;C:\Program Files (x86)\NetSarang\Xshell 7;C:\Program Files (x86)\NetSarang\Xftp 7;C:\Program Files (x86)\PDFtk\bin;D:\mindopt\0.15.1\win64-x86\bin;D:\mindopt\0.15.1\win64-x86\lib;C:\Program Files\nodejs;C:\Program Files\Calibre2;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\Pandoc;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.3.1;C:\Users\polyt\.cargo\bin;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin;C:\Program Files\Git\cmd;C:\Program Files\nodejs;C:\ProgramData\chocolatey\bin;X:\Program Files (x86)\BaseX\bin;C:\Users\polyt\Downloads\apache-ant-1.10.8-bin\apache-ant-1.10.8\bin;C:\Program Files\Git\usr\bin;C:\Program Files\dotnet;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System3;C:\Users\polyt\AppData\Local\Programs\Ollama;C:\Users\polyt\AppData\Local\GitHubDesktop\bin;C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2"
time=2024-05-02T14:05:16.736+08:00 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\polyt\AppData\Local\Temp\ollama1171806004\runners\cpu_avx2\ollama_llama_server.exe --model C:\Users\polyt\.ollama\models\blobs\sha256-09a73f0b7eb0ea0cc700f6e0eb9e06f23d25600232f09c13d77e402cc8f9667e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 11869"
time=2024-05-02T14:05:20.171+08:00 level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
time=2024-05-02T14:05:20.233+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:11869/health\": dial tcp 127.0.0.1:11869: connectex: No connection could be made because the target machine actively refused it."
{"function":"server_params_parse","level":"WARN","line":2382,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"23116","timestamp":1714629920}
{"function":"server_params_parse","level":"WARN","line":2494,"msg":"server.cpp is not built with verbose logging.","tid":"23116","timestamp":1714629920}
{"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"23116","timestamp":1714629920}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"23116","timestamp":1714629920,"total_threads":16}
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from C:\Users\polyt.ollama\models\blobs\sha256-09a73f0b7eb0ea0cc700f6e0eb9e06f23d25600232f09c13d77e402cc8f9667e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = llama-3-chinese-instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 7
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128001
llama_model_loader: - kv 21: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
time=2024-05-02T14:05:20.480+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-05-02T14:05:21.290+08:00 level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:11869/health": dial tcp 127.0.0.1:11869: connectex: No connection could be made because the target machine actively refused it."
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 7.95 GiB (8.50 BPW)
llm_load_print_meta: general.name = llama-3-chinese-instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: PAD token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: CPU buffer size = 8137.64 MiB
.........................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
time=2024-05-02T14:05:22.730+08:00 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "
time=2024-05-02T14:05:22.730+08:00 level=DEBUG source=server.go:832 msg="stopping llama server"
[GIN] 2024/05/02 - 14:05:22 | 500 | 9.1479345s | 127.0.0.1 | POST "/api/chat"

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 2, 2024

Thanks for the logs. It looks like the model is triggering a crash inside llama.cpp during load, regardless of GPU type.

@dhiltgen dhiltgen removed windows nvidia Issues relating to Nvidia GPUs and CUDA labels May 2, 2024
@dhiltgen dhiltgen changed the title llama runner process no longer running crash loading llama-3-chinese-8b-instruct model May 2, 2024
@dhiltgen dhiltgen removed their assignment May 2, 2024
@jiangweiatgithub
Copy link
Author

jiangweiatgithub commented May 3, 2024

Just FYI: After I have updated to 0.1.33, the message now is:
Error: timed out waiting for llama runner to start:

And on 0.1.35, the message is:
Error: llama runner process has terminated: exit status 0xc0000409

@jiangweiatgithub
Copy link
Author

Never mind. It turns to be incomplete GGUF files downloaded by vanilla GIT command line. Sorry for the faulse alarm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model request Model requests
Projects
None yet
Development

No branches or pull requests

2 participants