You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2023-12-21T08:47:57.269073390Z using_decoupled True
2023-12-21T08:48:00.018871598Z INFO 12-21 16:48:00 llm_engine.py:72] Initializing an LLM engine with config: model='./models/model_repository/Qwen-7B-Chat', tokenizer='./models/model_repository/Qwen-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
2023-12-21T08:48:00.558397300Z WARNING 12-21 16:48:00 tokenizer.py:66] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
2023-12-21T08:48:16.925095174Z INFO 12-21 16:48:16 llm_engine.py:207] # GPU blocks: 394, # CPU blocks: 512
2023-12-21T08:48:42.067733569Z INFO 12-21 16:48:42 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%
2023-12-21T03:56:58Z I 1 metrics.cc:870] Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090
2023-12-21T03:56:58Z I 1 metrics.cc:761] Collecting CPU metrics
2023-12-21T03:56:58Z I 1 grpc_server.cc:4822] Started GRPCInferenceService at 0.0.0.0:9000
2023-12-21T03:56:58Z I 1 http_server.cc:4446] Started HTTPService at 0.0.0.0:9001
2023-12-21T03:56:58Z I 1 http_server.cc:190] Started Metrics Service at 0.0.0.0:9002
2023-12-21T04:00:18Z I 1 model_lifecycle.cc:459] loading: Qwen-7B-Chat:1
2023-12-21T04:00:21Z I 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: Qwen-7B-Chat_0 (GPU device 0)
2023-12-21T04:00:52Z I 1 model_lifecycle.cc:693] successfully loaded 'Qwen-7B-Chat' version 1
2023-12-21T04:15:15Z E 1 http_server.cc:3554] [INTERNAL] received a response without FINAL flag
2023-12-21T04:15:29Z E 1 http_server.cc:3554] [INTERNAL] received a response without FINAL flag
The text was updated successfully, but these errors were encountered:
RT: 0.0.5
model config
b'{"error":"expected a single response, got 2"}'
RT log
2023-12-21T03:56:58Z I 1 metrics.cc:870] Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090
2023-12-21T03:56:58Z I 1 metrics.cc:761] Collecting CPU metrics
2023-12-21T03:56:58Z I 1 grpc_server.cc:4822] Started GRPCInferenceService at 0.0.0.0:9000
2023-12-21T03:56:58Z I 1 http_server.cc:4446] Started HTTPService at 0.0.0.0:9001
2023-12-21T03:56:58Z I 1 http_server.cc:190] Started Metrics Service at 0.0.0.0:9002
2023-12-21T04:00:18Z I 1 model_lifecycle.cc:459] loading: Qwen-7B-Chat:1
2023-12-21T04:00:21Z I 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: Qwen-7B-Chat_0 (GPU device 0)
2023-12-21T04:00:52Z I 1 model_lifecycle.cc:693] successfully loaded 'Qwen-7B-Chat' version 1
2023-12-21T04:15:15Z E 1 http_server.cc:3554] [INTERNAL] received a response without FINAL flag
2023-12-21T04:15:29Z E 1 http_server.cc:3554] [INTERNAL] received a response without FINAL flag
The text was updated successfully, but these errors were encountered: