You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In current docs, there a few examples about how to query vlm model. for example:
from huggingface_hub import InferenceClient
client = InferenceClient("http://127.0.0.1:3000")
image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
prompt = f"![]({image})What is this a picture of?\n\n"
for token in client.text_generation(prompt, max_new_tokens=16, stream=True):
print(token)
Expected behavior
However, there is no example of how to deal with the default chat template. For example, the chat template of llava-hf/llava-v1.6-34b-hf is following:
"<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\n<your_text_prompt_here><|im_end|><|im_start|>assistant\n"
Should we ignore it and use the tgi format as showed above? and how to deal with the multi-turn queries? Any examples would be appreciated.
The text was updated successfully, but these errors were encountered:
Hi @paulcx thanks for pointing this out, we should be more clear about generation and templates in the docs.
In TGI the chat_template is applied when the chat endpoint is used, in the example above the generate endpoint is used and no template is applied.
Chat can used with the chat_completion method like below.
fromhuggingface_hubimportInferenceClientclient=InferenceClient("http://127.0.0.1:3000")
chat=client.chat_completion(
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whats in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
},
},
],
},
],
seed=42,
max_tokens=100,
)
print(chat)
# ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='length', index=0, message=ChatCompletionOutputMessage(role='assistant', content=" The image you've provided features an anthropomorphic rabbit in spacesuit attire. This rabbit is depicted with human-like posture and movement, standing on a rocky terrain with a vast, reddish-brown landscape in the background. The spacesuit is detailed with mission patches, circuitry, and a helmet that covers the rabbit's face and ear, with an illuminated red light on the chest area.\n\nThe artwork style is that of a", name=None, tool_calls=None), logprobs=None)], created=1714589614, id='', model='llava-hf/llava-v1.6-mistral-7b-hf', object='text_completion', system_fingerprint='2.0.2-native', usage=ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=2943, total_tokens=3043))
Note that when using the chat endpoint images are sent as typed messages rather than markdown format.
I hope this helps clarify! please let me know if you have any questions
System Info
docs[main]: https://huggingface.co/docs/text-generation-inference/basic_tutorials/visual_language_models
vlm: https://huggingface.co/llava-hf/llava-v1.6-34b-hf
Information
Tasks
Reproduction
In current docs, there a few examples about how to query vlm model. for example:
Expected behavior
However, there is no example of how to deal with the default chat template. For example, the chat template of llava-hf/llava-v1.6-34b-hf is following:
Should we ignore it and use the tgi format as showed above? and how to deal with the multi-turn queries? Any examples would be appreciated.
The text was updated successfully, but these errors were encountered: