LLM: update split tensor conditions. #10872

lalalapotter · 2024-04-24T07:39:54Z

Description

This PR is to update split tensor conditions:

enable when "IPEX_LLM_SPLIT_QKV" is enabled.
enable when intermedia attn_weights size larger than block memory limitation.
update quantize kv cache conditions.

hkvision · 2024-04-28T02:20:38Z

python/llm/src/ipex_llm/transformers/models/chatglm2.py

+        # split tensor for memory block limitation
+        # support fp16 and set input length threshold at 5000 for now
+        return True
+    if query_layer.element_size()*bsz*n_head*seq_len*seq_len >= 4*1024*1024*1024:


will this check hurt performance? query_layer.element_size(), should be no?

Seems not:
use condition with element_size():
0,meta-llama/Llama-2-7b-chat-hf,4839.14,27.14,0.0,8192-512,1,8192-512,1,fp8,False,4.49,11.966796875,N/A,N/A
use condition with shape:
0,meta-llama/Llama-2-7b-chat-hf,4837.71,27.13,0.0,8192-512,1,8192-512,1,fp8,False,4.48,11.966796875,N/A,N/A

lalalapotter added 2 commits April 24, 2024 15:08

LLM: update split tensor condition.

2475227

add cond for split tensor.

a6e84c1

lalalapotter added the llm label Apr 24, 2024

lalalapotter requested a review from hkvision April 24, 2024 07:39

lalalapotter self-assigned this Apr 24, 2024

hkvision reviewed Apr 28, 2024

View reviewed changes

lalalapotter added 4 commits April 30, 2024 16:19

update priority of env.

2b4eea5

Merge branch 'main' into change-split-tensor-cond

509f1b2

fix style.

eaafea0

update env name.

c09582c

hkvision approved these changes Apr 30, 2024

View reviewed changes

lalalapotter merged commit 75dbf24 into intel-analytics:main Apr 30, 2024
16 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM: update split tensor conditions. #10872

LLM: update split tensor conditions. #10872

lalalapotter commented Apr 24, 2024 •

edited

hkvision Apr 28, 2024

lalalapotter Apr 30, 2024

LLM: update split tensor conditions. #10872

LLM: update split tensor conditions. #10872

Conversation

lalalapotter commented Apr 24, 2024 • edited

Description

hkvision Apr 28, 2024

Choose a reason for hiding this comment

lalalapotter Apr 30, 2024

Choose a reason for hiding this comment

lalalapotter commented Apr 24, 2024 •

edited