You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
[0] warnings.warn(
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
[1] warnings.warn(
[1] 2024-04-15 13:41:37,444 - INFO - intel_extension_for_pytorch auto imported
[0] 2024-04-15 13:41:37,457 - INFO - intel_extension_for_pytorch auto imported
[1] [2024-04-15 13:41:37,865] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[0] [2024-04-15 13:41:37,881] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[1] [WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[1] [WARNING] async_io: please install the libaio-dev package with apt
[1] [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[1] model_path: /opt/WD/900-Model-Data/Model/Qwen-7B-Chat
[1] [2024-04-15 13:41:38,252] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
[0] [WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[0] [WARNING] async_io: please install the libaio-dev package with apt
[0] [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[0] model_path: /opt/WD/900-Model-Data/Model/Qwen-7B-Chat
[0] [2024-04-15 13:41:38,265] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
Loading checkpoint shards: 100%|██████████| 8/8 [10:07<00:00, 75.90s/it][0]
Loading checkpoint shards: 100%|██████████| 8/8 [10:07<00:00, 75.91s/it][1]
[1] [2024-04-15 13:51:46,932] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[0] [2024-04-15 13:51:46,932] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[1] [2024-04-15 13:51:46,932] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[1] [2024-04-15 13:51:46,932] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[1] [2024-04-15 13:51:46,933] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[0] [2024-04-15 13:51:46,933] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[0] [2024-04-15 13:51:46,933] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[0] [2024-04-15 13:51:46,933] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[1] Using /home/ubuntu/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[0] Using /home/ubuntu/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[1] Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[1] Building extension module deepspeed_ccl_comm...
[1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1] ninja: no work to do.
[1] Loading extension module deepspeed_ccl_comm...
[0] Loading extension module deepspeed_ccl_comm...
[0] Time to load deepspeed_ccl_comm op: 0.2021636962890625 seconds
[0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] Time to load deepspeed_ccl_comm op: 0.1423795223236084 seconds
[1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x796d96a8fb50>
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x75eb9c38d4d0>
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-04-15 13:51:55,709] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.215.160, master_port=29500
[1] [2024-04-15 13:51:55,709] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.215.160, master_port=29500
[0] [2024-04-15 13:51:55,709] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[0] 2024-04-15 13:51:58,306 - INFO - Converting the current model to sym_int4 format......
[1] 2024-04-15 13:51:58,386 - INFO - Converting the current model to sym_int4 format......
[1] AutoTP: [(<class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenBlock'>, ['mlp.c_proj', 'attn.c_proj'])]
[1] >> loading of model costs 620.1323246199172s
[1] [2024-04-15 13:52:09,164] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified)
[0] AutoTP: [(<class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenBlock'>, ['attn.c_proj', 'mlp.c_proj'])]
[0] >> loading of model costs 620.0398639050545s
[0] [2024-04-15 13:52:09,836] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified)
[1] [2024-04-15 13:52:10,415] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-04-15 13:52:10,416] [INFO] [comm.py:637:init_distributed] cdb=None
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.8 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
[1] warnings.warn(
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:407: UserWarning: do_sample is set to False. However, top_k is set to 0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k.
[1] warnings.warn(
[0] [2024-04-15 13:52:11,125] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[0] [2024-04-15 13:52:11,125] [INFO] [comm.py:637:init_distributed] cdb=None
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.8 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
[0] warnings.warn(
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:407: UserWarning: do_sample is set to False. However, top_k is set to 0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k.
[0] warnings.warn(
[1] <class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenLMHeadModel'>
[1] Traceback (most recent call last):
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1721, in
[1] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 111, in run_model
[1] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1525, in run_deepspeed_optimize_model_gpu
[1] output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[1] return func(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 1563, in generate
[1] return self.greedy_search(
[1] ^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 2385, in greedy_search
[1] outputs = self(
[1] ^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 533, in call
[1] return self.model(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1043, in forward
[1] transformer_outputs = self.transformer(
[1] ^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 715, in qwen_model_forward
[1] outputs = block(
[1] ^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 610, in forward
[1] attn_outputs = self.attn(
[1] ^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 109, in qwen_attention_forward
[1] return forward_function(
[1] ^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 369, in qwen_attention_forward_quantized
[1] query = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] RuntimeError: shape '[1, 1024, 16, 128]' is invalid for input of size 4194304
[0] <class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenLMHeadModel'>
[0] Traceback (most recent call last):
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1721, in
[0] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 111, in run_model
[0] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1525, in run_deepspeed_optimize_model_gpu
[0] output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[0] return func(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 1563, in generate
[0] return self.greedy_search(
[0] ^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 2385, in greedy_search
[0] outputs = self(
[0] ^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 533, in call
[0] return self.model(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1043, in forward
[0] transformer_outputs = self.transformer(
[0] ^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 715, in qwen_model_forward
[0] outputs = block(
[0] ^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 610, in forward
[0] attn_outputs = self.attn(
[0] ^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 109, in qwen_attention_forward
[0] return forward_function(
[0] ^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 369, in qwen_attention_forward_quantized
[0] query = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] RuntimeError: shape '[1, 1024, 16, 128]' is invalid for input of size 4194304
The text was updated successfully, but these errors were encountered:
Hi, @kevin-t-tang , I could reproduce this error of Qwen-7B-Chat and Qwen-14B-Chat using AutoTP, and this PR(#10766) could fix it. Please have a try after this PR is merged to nightly version.
ipex-llm + deepspeed run Qwen-7B-Chat with the following error:
[0] RuntimeError: shape '[1, 1024, 16, 128]' is invalid for input of size 4194304
accelerate 0.29.2
mpi4py 3.1.6
bigdl-core-xe-21 2.5.0b20240411
bigdl-core-xe-esimd-21 2.5.0b20240411
ipex-llm 2.1.0b20240411
transformers 4.37.2
:: initializing oneAPI environment ...
run-deepspeed-arc_dg2.sh: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for oneapi-vars.sh arguments: --force
:: advisor -- processing etc/advisor/vars.sh
:: ccl -- processing etc/ccl/vars.sh
:: compiler -- processing etc/compiler/vars.sh
:: dal -- processing etc/dal/vars.sh
:: debugger -- processing etc/debugger/vars.sh
:: dnnl -- processing etc/dnnl/vars.sh
:: dpct -- processing etc/dpct/vars.sh
:: dpl -- processing etc/dpl/vars.sh
:: ipp -- processing etc/ipp/vars.sh
:: ippcp -- processing etc/ippcp/vars.sh
:: mkl -- processing etc/mkl/vars.sh
:: mpi -- processing etc/mpi/vars.sh
:: tbb -- processing etc/tbb/vars.sh
:: vtune -- processing etc/vtune/vars.sh
:: oneAPI environment initialized ::
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
[0] warnings.warn(
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
[1] warnings.warn(
[1] 2024-04-15 13:41:37,444 - INFO - intel_extension_for_pytorch auto imported
[0] 2024-04-15 13:41:37,457 - INFO - intel_extension_for_pytorch auto imported
[1] [2024-04-15 13:41:37,865] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[0] [2024-04-15 13:41:37,881] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[1] [WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[1] [WARNING] async_io: please install the libaio-dev package with apt
[1] [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[1] model_path: /opt/WD/900-Model-Data/Model/Qwen-7B-Chat
[1] [2024-04-15 13:41:38,252] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
[0] [WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[0] [WARNING] async_io: please install the libaio-dev package with apt
[0] [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[0] model_path: /opt/WD/900-Model-Data/Model/Qwen-7B-Chat
[0] [2024-04-15 13:41:38,265] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
Loading checkpoint shards: 100%|██████████| 8/8 [10:07<00:00, 75.90s/it][0]
Loading checkpoint shards: 100%|██████████| 8/8 [10:07<00:00, 75.91s/it][1]
[1] [2024-04-15 13:51:46,932] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[0] [2024-04-15 13:51:46,932] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[1] [2024-04-15 13:51:46,932] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[1] [2024-04-15 13:51:46,932] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[1] [2024-04-15 13:51:46,933] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[0] [2024-04-15 13:51:46,933] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[0] [2024-04-15 13:51:46,933] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[0] [2024-04-15 13:51:46,933] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[1] Using /home/ubuntu/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[0] Using /home/ubuntu/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[1] Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[1] Building extension module deepspeed_ccl_comm...
[1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1] ninja: no work to do.
[1] Loading extension module deepspeed_ccl_comm...
[0] Loading extension module deepspeed_ccl_comm...
[0] Time to load deepspeed_ccl_comm op: 0.2021636962890625 seconds
[0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] Time to load deepspeed_ccl_comm op: 0.1423795223236084 seconds
[1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x796d96a8fb50>
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x75eb9c38d4d0>
[1] [2024-04-15 13:51:55,218] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-04-15 13:51:55,218] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-04-15 13:51:55,709] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.215.160, master_port=29500
[1] [2024-04-15 13:51:55,709] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.215.160, master_port=29500
[0] [2024-04-15 13:51:55,709] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[0] 2024-04-15 13:51:58,306 - INFO - Converting the current model to sym_int4 format......
[1] 2024-04-15 13:51:58,386 - INFO - Converting the current model to sym_int4 format......
[1] AutoTP: [(<class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenBlock'>, ['mlp.c_proj', 'attn.c_proj'])]
[1] >> loading of model costs 620.1323246199172s
[1] [2024-04-15 13:52:09,164] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified)
[0] AutoTP: [(<class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenBlock'>, ['attn.c_proj', 'mlp.c_proj'])]
[0] >> loading of model costs 620.0398639050545s
[0] [2024-04-15 13:52:09,836] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified)
[1] [2024-04-15 13:52:10,415] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-04-15 13:52:10,416] [INFO] [comm.py:637:init_distributed] cdb=None
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:397: UserWarning:
do_sample
is set toFalse
. However,top_p
is set to0.8
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.[1] warnings.warn(
[1] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:407: UserWarning:
do_sample
is set toFalse
. However,top_k
is set to0
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
.[1] warnings.warn(
[0] [2024-04-15 13:52:11,125] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[0] [2024-04-15 13:52:11,125] [INFO] [comm.py:637:init_distributed] cdb=None
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:397: UserWarning:
do_sample
is set toFalse
. However,top_p
is set to0.8
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.[0] warnings.warn(
[0] /home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:407: UserWarning:
do_sample
is set toFalse
. However,top_k
is set to0
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
.[0] warnings.warn(
[1] <class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenLMHeadModel'>
[1] Traceback (most recent call last):
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1721, in
[1] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 111, in run_model
[1] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1525, in run_deepspeed_optimize_model_gpu
[1] output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[1] return func(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 1563, in generate
[1] return self.greedy_search(
[1] ^^^^^^^^^^^^^^^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 2385, in greedy_search
[1] outputs = self(
[1] ^^^^^
[1] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 533, in call
[1] return self.model(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1043, in forward
[1] transformer_outputs = self.transformer(
[1] ^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 715, in qwen_model_forward
[1] outputs = block(
[1] ^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 610, in forward
[1] attn_outputs = self.attn(
[1] ^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self._call_impl(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return forward_call(*args, **kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 109, in qwen_attention_forward
[1] return forward_function(
[1] ^^^^^^^^^^^^^^^^^
[1] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 369, in qwen_attention_forward_quantized
[1] query = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] RuntimeError: shape '[1, 1024, 16, 128]' is invalid for input of size 4194304
[0] <class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenLMHeadModel'>
[0] Traceback (most recent call last):
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1721, in
[0] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 111, in run_model
[0] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/run.py", line 1525, in run_deepspeed_optimize_model_gpu
[0] output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[0] return func(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 1563, in generate
[0] return self.greedy_search(
[0] ^^^^^^^^^^^^^^^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 2385, in greedy_search
[0] outputs = self(
[0] ^^^^^
[0] File "/opt/WD/091-GFX-Benchmark/BigDL/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 533, in call
[0] return self.model(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1043, in forward
[0] transformer_outputs = self.transformer(
[0] ^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 715, in qwen_model_forward
[0] outputs = block(
[0] ^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 610, in forward
[0] attn_outputs = self.attn(
[0] ^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 109, in qwen_attention_forward
[0] return forward_function(
[0] ^^^^^^^^^^^^^^^^^
[0] File "/home/ubuntu/miniconda3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen.py", line 369, in qwen_attention_forward_quantized
[0] query = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] RuntimeError: shape '[1, 1024, 16, 128]' is invalid for input of size 4194304
The text was updated successfully, but these errors were encountered: