You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
#144
Open
guidoveritone opened this issue
Apr 15, 2024
· 1 comment
Hey guys, i am trying to run the Mistral 7b model using the guide on the page.
I am running:
docker run --gpus all \
-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
ghcr.io/mistralai/mistral-src/vllm:latest \
--host 0.0.0.0 \
--model mistralai/Mistral-7B-Instruct-v0.2
and I am getting the following error:
└─$ docker run --gpus '"device=0"' -e HF_TOKEN=$HF_TOKEN -p 8000:8000 ghcr.io/mistralai/mistral-src/vllm:latest --host 0.0.0.0 --model mistralai/Mistral-7B-Instruct-v0.2
The HF_TOKEN environment variable set, logging to Hugging Face.
Token will not been saved to git credential helper. Pass `add_to_git_credential=True`if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
INFO 04-15 15:25:32 api_server.py:719] args: Namespace(host='0.0.0.0', port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], served_model_name=None, chat_template=None, response_role='assistant', model='mistralai/Mistral-7B-Instruct-v0.2', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
config.json: 100%|██████████| 596/596 [00:00<00:00, 6.74MB/s]
INFO 04-15 15:25:33 llm_engine.py:73] Initializing an LLM engine with config: model='mistralai/Mistral-7B-Instruct-v0.2', tokenizer='mistralai/Mistral-7B-Instruct-v0.2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
tokenizer_config.json: 100%|██████████| 1.46k/1.46k [00:00<00:00, 19.4MB/s]
tokenizer.model: 100%|██████████| 493k/493k [00:00<00:00, 9.14MB/s]
tokenizer.json: 100%|██████████| 1.80M/1.80M [00:00<00:00, 3.16MB/s]
special_tokens_map.json: 100%|██████████| 72.0/72.0 [00:00<00:00, 953kB/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 729, in<module>
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 495, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 269, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 314, in _init_engine
return engine_class(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 109, in __init__
self._init_workers(distributed_init_method)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 141, in _init_workers
self._run_workers(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
self._run_workers_in_batch(workers, method, *args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 724, in _run_workers_in_batch
output = executor(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 59, in init_model
torch.cuda.set_device(self.device)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
torch._C._cuda_setDevice(device)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
I tried several things to fix this, found things to do on
Hey guys, i am trying to run the Mistral 7b model using the guide on the page.
I am running:
docker run --gpus all \ -e HF_TOKEN=$HF_TOKEN -p 8000:8000 \ ghcr.io/mistralai/mistral-src/vllm:latest \ --host 0.0.0.0 \ --model mistralai/Mistral-7B-Instruct-v0.2
and I am getting the following error:
I tried several things to fix this, found things to do on
and nothing worked! also tried some nvidia default containers to check if CUDA is working and everything seems to work!
my
nvidia-smi
output:my
/etc/nvidia-container-runtime/config.toml
file:note: if I change the
no-cgroups
flag totrue
I get aNo CUDA Gpus available
error.OS:
The text was updated successfully, but these errors were encountered: