stream_infer input_embeddings #889
-
how could I use the input_embeddings in stream_infer? Was it designed for mix embedding? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@irexyc please help clarifying this question |
Beta Was this translation helpful? Give feedback.
-
For some tasks like qwen-vl or internlm-xcomposer, the decode process are same with normal llm. The only difference is the embedding layer. Normal llm use embedding layer to encode token_ids to input_embs. These multimodal model concat the image features and input_embs as final input. To make the code simpler, we add dummy ids to token_ids and after embedding layer, we replace that dummy embeddings with real image features. This is a web demo #874 |
Beta Was this translation helpful? Give feedback.
For some tasks like qwen-vl or internlm-xcomposer, the decode process are same with normal llm. The only difference is the embedding layer. Normal llm use embedding layer to encode token_ids to input_embs. These multimodal model concat the image features and input_embs as final input.
To make the code simpler, we add dummy ids to token_ids and after embedding layer, we replace that dummy embeddings with real image features.
This is a web demo #874