stream_infer input_embeddings #889

JidongZhang-THU · 2023-12-26T08:33:51Z

JidongZhang-THU
Dec 26, 2023

how could I use the input_embeddings in stream_infer? Was it designed for mix embedding?

Dec 26, 2023

For some tasks like qwen-vl or internlm-xcomposer, the decode process are same with normal llm. The only difference is the embedding layer. Normal llm use embedding layer to encode token_ids to input_embs. These multimodal model concat the image features and input_embs as final input.

To make the code simpler, we add dummy ids to token_ids and after embedding layer, we replace that dummy embeddings with real image features.

This is a web demo #874

View full answer

lvhan028 · 2023-12-26T12:27:10Z

lvhan028
Dec 26, 2023
Maintainer

@irexyc please help clarifying this question

0 replies

irexyc · 2023-12-26T12:50:35Z

irexyc
Dec 26, 2023
Collaborator

For some tasks like qwen-vl or internlm-xcomposer, the decode process are same with normal llm. The only difference is the embedding layer. Normal llm use embedding layer to encode token_ids to input_embs. These multimodal model concat the image features and input_embs as final input.

To make the code simpler, we add dummy ids to token_ids and after embedding layer, we replace that dummy embeddings with real image features.

This is a web demo #874

2 replies

JidongZhang-THU Dec 27, 2023
Author

Thanks for reply.
I was using adaptor model to combine text and img input to inputs_embeds. Model loaded by AutoModelForCausalLM.from_pretrained can take the inputs_embeds by generate function which is defined in GenerationMixin.
I want to know how I can use the stream_infer to fulfill the same inference by generate function.

I have adapted text and img to tensor, so I don't need input_ids
I don't know how to use input_embeddings and input_embedding_ranges

JidongZhang-THU Dec 27, 2023
Author

According to #874 , I wrote a test.
stream_infer(input_ids=[0] * input_feature.shape[0], session_id=0,
input_embeddings=[input_feature], input_embedding_ranges=[(0, input_feature.shape[0])])
'#' input_feature.shape (N, 4096)
It seems that works for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream_infer input_embeddings #889

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

stream_infer input_embeddings #889

JidongZhang-THU Dec 26, 2023

Replies: 2 comments · 2 replies

lvhan028 Dec 26, 2023 Maintainer

irexyc Dec 26, 2023 Collaborator

JidongZhang-THU Dec 27, 2023 Author

JidongZhang-THU Dec 27, 2023 Author

JidongZhang-THU
Dec 26, 2023

Replies: 2 comments 2 replies

lvhan028
Dec 26, 2023
Maintainer

irexyc
Dec 26, 2023
Collaborator

JidongZhang-THU Dec 27, 2023
Author

JidongZhang-THU Dec 27, 2023
Author