inference examples #1

SkalskiP · 2024-04-27T09:00:52Z

Hi 👋🏻 Do you have any inference examples that I could use?

mmaaz60 · 2024-04-27T09:27:15Z

Thank you for your interest in our work. The run_llava.py scripts from official LLaVA repository along with the edited files we provided in our repository can be used for inference.

However, we are planning to release some straight forward inference scripts soon. Stay tuned!

mmaaz60 · 2024-04-27T13:24:50Z

Hi @SkalskiP,

Thank you for your patience, Google Colab demo is now available, check it out,

mmaaz60 · 2024-04-27T13:39:43Z

Results from the for the below image,

Question: Describe the image in detail.

Model Response: The image features a group of five women standing on a street, each holding a colorful umbrella. They are all dressed in different outfits, showcasing a variety of styles. The women are positioned in a line, with one woman on the left side, another in the middle, and the remaining three on the right side of the street. In the background, there are a few cars parked on the left side of the street, and a stop sign is visible near the left edge of the image. The scene captures a fun and lively atmosphere as the women enjoy their time together under their umbrellas.

mmaaz60 · 2024-04-28T16:38:56Z

Hi @SkalskiP,

We have just released the online demo of both Phi-3-V and LLaMA-3-V. Check it out at

At1a8 · 2024-04-29T05:27:36Z

Hi @mmaaz60, thanks for your excellent work, can your provide a inference script base on checkpoints or weights trained by LLaMA3-V_finetune_lora.sh?

mmaaz60 · 2024-04-29T08:01:33Z

Hi @At1a8,

We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.

Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.

We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.

The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.

We hope it will help. Please let us know if you have any questions. Thank You

At1a8 · 2024-04-29T11:00:48Z

Hi @At1a8,

We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.

Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.

We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.

The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.

We hope it will help. Please let us know if you have any questions. Thank You

Thanks for your reply, we trained our models based on customed dataset, and we want to merge the weights from llama3 and lora weight which trained by your code.

How can we do that, could you please give a code example?

Thanks so much.

mmaaz60 · 2024-04-29T13:50:57Z

Hi @At1a8,

Thanks for the clarification, you can use the following script to merge LoRA weights after training.

import argparse
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path


def merge_lora(args):
    model_name = get_model_name_from_path(args.model_path)
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')

    model.save_pretrained(args.save_model_path)
    tokenizer.save_pretrained(args.save_model_path)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-path", type=str, required=True)
    parser.add_argument("--model-base", type=str, required=True)
    parser.add_argument("--save-model-path", type=str, required=True)

    args = parser.parse_args()

    merge_lora(args)

Here --model-path is the LoRA weights path, --model-base is the base model path (in your case it would be Meta LLaMA-3-8B) and --save-model-path is the path to save the merged checkpoints.

I hope it will help. Please let me know if you face any issues. Good Luck!

mmaaz60 · 2024-04-29T18:10:29Z

Hi @At1a8,

We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!

SkalskiP · 2024-04-29T19:59:38Z

@mmaaz60 thanks a lot! I'll make sure to play with it ;)

At1a8 · 2024-04-30T02:59:38Z

Hi @At1a8,

We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!

We have trained this scripts to get checkpoints

#!/bin/bash

deepspeed --include localhost:4,5,6,7 llava/train/train_mem4Drive.py \
    --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
    --deepspeed ./scripts/zero3.json \
    --model_name_or_path Undi95/Meta-Llama-3-8B-Instruct-hf \
    --version llama3 \
    --data_path ./../v1_full_llama.json \
    --image_folder ./../vlm_dataset \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --pretrain_mm_mlp_adapter ./checkpoints/LLaVA-Meta-Llama-3-8B-Instruct-pretrain/mm_projector.bin \
    --mm_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length True \
    --bf16 True \
    --output_dir ./checkpoints/llava-v1.5-llama3-8b-task-lora1 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 8192 \
    --gradient_checkpointing True \
    --dataloader_num_workers 8 \
    --lazy_preprocess True \
    --report_to none #wandb

and used merged script mentioned above and have following logs:

(llava) fangyuan@xcdloss220176:/group/ossdphi_algo_scratch_02/fangyuan/LLaVA/LLaVA$ python3 ./scripts/merge_lora_weights.py --model-base Undi95/Meta-Llama-3-8B-Instruct-hf --model-path ./checkpoints/llava-v1.5-llama3-8b-task-lora1 --save-model-path ./../runs/llava_llama3_test1
[2024-04-30 10:44:24,172] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:51<00:00, 12.95s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Adding pad token as '<pad>'
Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
/group/ossdphi_algo_scratch_02/fangyuan/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()

We still cannot access to meta-llama/Meta-Llama-3-8B-Instruct and we use Undi95/Meta-Llama-3-8B-Instruct-hf instead, and we encounted this warning:

Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Why does this warning appear and do we have a someway to solve it?
Looking forward to you suggestions.

mmaaz60 · 2024-04-30T08:14:48Z

Hi @At1a8

This warning is normal. During merging, we first try to load base LLM checkpoints to our Visual-LLM class that do not have projector weights. However later we load LoRA and additional weights that contain the projector weights as well.

In summary, this warning is normal and you can ignore it and proceed. Good Luck!

update image process func in cli.py

pythonlearner1025 pushed a commit to pythonlearner1025/LLaVA-pp that referenced this issue May 8, 2024

Merge pull request mbzuai-oryx#1 from hill2hill/hill2hill-patch-1

16315b3

update image process func in cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference examples #1

inference examples #1

SkalskiP commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024 •

edited

mmaaz60 commented Apr 28, 2024 •

edited

At1a8 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

At1a8 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

SkalskiP commented Apr 29, 2024

At1a8 commented Apr 30, 2024

mmaaz60 commented Apr 30, 2024

inference examples #1

inference examples #1

Comments

SkalskiP commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024

mmaaz60 commented Apr 27, 2024 • edited

mmaaz60 commented Apr 28, 2024 • edited

At1a8 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

At1a8 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

mmaaz60 commented Apr 29, 2024

SkalskiP commented Apr 29, 2024

At1a8 commented Apr 30, 2024

mmaaz60 commented Apr 30, 2024

mmaaz60 commented Apr 27, 2024 •

edited

mmaaz60 commented Apr 28, 2024 •

edited