Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

源码复现过程中出现很多问题 #187

Open
lxnlxnlxnlxnlxn opened this issue Nov 1, 2023 · 4 comments
Open

源码复现过程中出现很多问题 #187

lxnlxnlxnlxnlxn opened this issue Nov 1, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@lxnlxnlxnlxnlxn
Copy link

lxnlxnlxnlxnlxn commented Nov 1, 2023

LightLLM运行过程

复现kvoff分支

第一步:创建docker

拉取镜像:docker pull ghcr.io/modeltc/lightllm:main

llama-7b模型过大,在服务器的docker中直接clone总是发生网络中断,因此我将该模型下载到本地,通过Xftp传输到服务器中,而后在创建docker时将模型文件夹映射到lightllm源码的models文件夹中。

模型仓库:[huggyllama/llama-7b · Hugging Face](https://huggingface.co/huggyllama/llama-7b)

docker run -itd --ipc=host --net=host  --name lxn_lightllm --gpus all -p 8080:8080 -v /hdd/lxn/llama-7b:/lightllm/lightllm/models/llama-7b ghcr.io/modeltc/lightllm:main /bin/bash
第二步:运行

源码安装:

python setup.py install

模型运行:

python -m lightllm.server.api_server --model_dir models/llama-7b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 120000
错误信息——OOM
load model error: CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF <class 'torch.cuda.OutOfMemoryError'>
Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 257, in start_router_process
    asyncio.run(router.wait_to_model_ready())
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 62, in wait_to_model_ready
    await asyncio.gather(*init_model_ret)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 229, in init_model
    ans : rpyc.AsyncResult = self._init_model(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 97, in exposed_init_model
    raise e
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 68, in exposed_init_model
    self.model = LlamaTpPartModel(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/models/llama/model.py", line 35, in __init__
    super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode, weight_dict, finetune_config)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/basemodel/basemodel.py", line 40, in __init__
    self._init_mem_manager()
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/models/llama/model.py", line 56, in _init_mem_manager
    self.mem_manager = self.memory_manager_class(self.max_total_token_num,
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 10, in __init__
    self._init_buffers(size, dtype, head_num, head_dim, layer_num)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 14, in _init_buffers
    self.key_buffer = [torch.empty((size, head_num, head_dim), dtype=dtype, device="cuda") for _ in range(layer_num)]
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 14, in <listcomp>
    self.key_buffer = [torch.empty((size, head_num, head_dim), dtype=dtype, device="cuda") for _ in range(layer_num)]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 260, in start_router_process
    err_str = '\n'.join(traceback.format_exception(e))
TypeError: format_exception() missing 2 required positional arguments: 'value' and 'tb'

​ 后将max_total_token_num的值从120000改为6000,OOM错误消失,但有发生了下面的错误(每次总是在下面三种错误中随机出现一种)。在Google上搜索了类似错误,但并没有解决。

1
Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
    loop.run_until_complete(router.loop_for_netio_req())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
    recv_req = await self.recv_from_httpserver.recv_pyobj()
  File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
    loaded = load(buf)
_pickle.UnpicklingError: could not find MARK
2
Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
    loop.run_until_complete(router.loop_for_netio_req())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
    recv_req = await self.recv_from_httpserver.recv_pyobj()
  File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
    loaded = load(buf)
_pickle.UnpicklingError: invalid load key, 'n'.
3
Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
    loop.run_until_complete(router.loop_for_netio_req())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
    recv_req = await self.recv_from_httpserver.recv_pyobj()
  File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
    loaded = load(buf)
_pickle.UnpicklingError: invalid load key, '"'

​ api_server无法运行:

image-20231030202753642

api_server无法运行

@lxnlxnlxnlxnlxn lxnlxnlxnlxnlxn added the bug Something isn't working label Nov 1, 2023
@PannenetsF
Copy link
Contributor

这个分支没有进行server的测试,可以看看跑test有没有问题

@lxnlxnlxnlxnlxn
Copy link
Author

那请问目前是只能运行Readme中的Static inference performance部分嘛(kvoff分支)?

@PannenetsF
Copy link
Contributor

PannenetsF commented Nov 2, 2023 via email

@lxnlxnlxnlxnlxn
Copy link
Author

lxnlxnlxnlxnlxn commented Nov 5, 2023

我在Huggingface官网上下载了Chinese-LLaMA-2-1.3B模型,而后运行

test/model/test_llama2.py,得到以下报错:
root@gpu0:/lightllm/test/model# python test_llama2.py
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
F
======================================================================
FAIL: test_llama2_infer (__main__.TestLlama2Infer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/lightllm/test/model/test_llama2.py", line 11, in test_llama2_infer
    test_model_inference(world_size=1,
  File "/lightllm/test/model/model_infer.py", line 16, in test_model_inference
    assert not ans_queue.empty()
AssertionError

----------------------------------------------------------------------
Ran 1 test in 9.372s

FAILED (failures=1)

此问题的与这个issue类似,但这个issue里并没有详细的解决方案

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants