You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for example, i load three images to model like this:
pixel_values_0=load_image("./test_video/clip10/clip1000.png", max_num=6).to(torch.bfloat16).cuda()
pixel_values_1=load_image("./test_video/clip10/clip1020.png", max_num=6).to(torch.bfloat16).cuda()
pixel_values_2=load_image("./test_video/clip10/clip1040.png", max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values_0,pixel_values_1,pixel_values_2), dim=0)
question = "how many pictures did you see?"
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)
And the model respond: I saw one picture.
Then test the official code:
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
question = "详细描述这两张图片" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)
question = "这两张图片的相同点和区别分别是什么" # What are the similarities and differences between these two pictures
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
print(question, response)
And I got the responese like: 这张图片展示了一只大熊猫,它是中国的国宝。大熊猫坐在地上,周围是绿色的植被和竹子。它的毛色主要是黑白相间的,有着非常明显的黑色眼圈和耳朵。大熊猫看起来很平静,似乎在享受周围的环境。
Hi, this is because during training, the model only encountered single-image samples. The multi-image capability mainly relies on zero-shot, and its performance is unstable. We plan to include interleaved multi-image data for training in the June version, which is expected to improve multi-image dialogue performance.
Thanks for your great job!
I follow your tutorial in https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-Int8 and I found that the model only support single image conversation. I use the Int8 model.
for example, i load three images to model like this:
And the model respond: I saw one picture.
Then test the official code:
And I got the responese like: 这张图片展示了一只大熊猫,它是中国的国宝。大熊猫坐在地上,周围是绿色的植被和竹子。它的毛色主要是黑白相间的,有着非常明显的黑色眼圈和耳朵。大熊猫看起来很平静,似乎在享受周围的环境。
背景中可以看到一些木制的结构和岩石,这可能是动物园或野生动物保护区的一部分。整体上,这张图片传达了一种宁静和自然的感觉,同时也展示了这种珍稀动物在自然环境中的生活状态。
I don't know how to fix this bug.
Here is my full test code:
The text was updated successfully, but these errors were encountered: