Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions on running llava-v1.6-mistral-7b #1115

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aliencaocao
Copy link

@aliencaocao aliencaocao commented Feb 10, 2024

After many hours of debugging, I finally got llava-v1.6-mistral-7b to work fully on SGLang inference backend.

This PR adds the relevant instructions to README.md, which references a PR I made on Hugging Face containing all the patches needed to make loading work.

Closes #1114
Closes #1112
Closes #1179
Also closes (from SGLang repo: sgl-project/sglang#128 )

Summary of patches:

  1. create added_tokens.json and put:
{
  "<image>": 32000,
  "<pad>": 32001
}

this was from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/added_tokens.json which is linked by sgl-project/sglang#127 (comment)

  1. in config.json, change LlavaMistralForCausalLM to LlavaLlamaForCausalLM, "model_type": "llava_mistral", to "model_type": "llava"
    this was from [Bug] liuhaotian/llava-v1.6-mistral-7b doesn't load sgl-project/sglang#128 (comment)

  2. change generation config.json to add a line before the transformer_version:
    "pad_token_id": 32001,

  3. Add preprocessor_config.json from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/preprocessor_config.json

{
	"crop_size": {
	  "height": 336,
	  "width": 336
	},
	"do_center_crop": true,
	"do_convert_rgb": true,
	"do_normalize": true,
	"do_rescale": true,
	"do_resize": true,
	"image_mean": [
	  0.48145466,
	  0.4578275,
	  0.40821073
	],
	"image_processor_type": "CLIPImageProcessor",
	"image_std": [
	  0.26862954,
	  0.26130258,
	  0.27577711
	],
	"processor_class": "LlavaProcessor",
	"resample": 3,
	"rescale_factor": 0.00392156862745098,
	"size": {
	  "shortest_edge": 336
	}
  }
  1. in special_token_map.json add
  "pad_token": {
    "content": "<pad>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },

this was from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/special_tokens_map.json

  1. tokenizer_config.json change to https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/tokenizer_config.json
    Diffs:
    "32000": {
      "content": "<image>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "32001": {
      "content": "<pad>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },

and

  "legacy": false,
  "model_max_length": 4096,
  "pad_token": "<pad>",
  "padding_side": "right",
  "processor_class": "LlavaProcessor",

But need to keep the "chat_template" row from original one (vicuna one dont have)

  1. tokenizer.json use the original one BUT in added_tokens, append:
    {
      "id": 32000,
      "content": "<image>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    },
    {
      "id": 32001,
      "content": "<pad>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    }

@RonanKMcGovern
Copy link

RonanKMcGovern commented Feb 14, 2024

FWIW I've pushed what I think are these patches to huggingface here

@RonanKMcGovern
Copy link

Perhaps I'm doing something wrong, but these patches result in <pad> being in the response:

Prompt: [INST] <image>
What do you see in this picture? [/INST]
<s> The image shows a wooden chess set on a wooden table. There are three chess pieces visible: a rook, a knight, and a pawn. The rook and knight are standing upright, while the pawn is lying on its side. The pieces appear to be made of a dark wood, and the table has a light wood finish. The shadow of the chess pieces is<pad><pad><pad> the table,<pad><pad><pad><pad><pad><pad>, indicating that the light source is coming from the direction the shadow is cast. </s>

@aliencaocao
Copy link
Author

i didnt observe this using the chair example. Try deleting the pad related additions? i actually dont have a concrete evidence saying pad is even necessary.

@aliencaocao
Copy link
Author

Yea pad seem to be extra as they use unk as pad, so i guess should delete the pad related entries and set pad token id in various files to 0 (unk)
Been running this and no issues so far

@RylanSchaeffer
Copy link

@RonanKMcGovern thanks for posting the patched version on huggingface! Quick question: did you update to include @aliencaocao 's recent pad solution?

@RonanKMcGovern
Copy link

RonanKMcGovern commented Mar 3, 2024 via email

@RylanSchaeffer
Copy link

@RonanKMcGovern can you link the video?

@RonanKMcGovern
Copy link

RonanKMcGovern commented Mar 3, 2024 via email

@ppx-hub
Copy link

ppx-hub commented Mar 6, 2024

After many hours of debugging, I finally got llava-v1.6-mistral-7b to work fully on SGLang inference backend.

This PR adds the relevant instructions to README.md, which references a PR I made on Hugging Face containing all the patches needed to make loading work.

Closes #1114 Closes #1112 Closes #1179 Also closes (from SGLang repo: sgl-project/sglang#128

Summary of patches:

  1. create added_tokens.json and put:
{
  "<image>": 32000,
  "<pad>": 32001
}

this was from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/added_tokens.json which is linked by sgl-project/sglang#127 (comment)

  1. in config.json, change LlavaMistralForCausalLM to LlavaLlamaForCausalLM, "model_type": "llava_mistral", to "model_type": "llava"
    this was from [Bug] liuhaotian/llava-v1.6-mistral-7b doesn't load sgl-project/sglang#128 (comment)
  2. change generation _onfig.json to add a line before the transformer_version:
    "pad_token_id": 32001,
  3. Add preprocessor_config.json from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/preprocessor_config.json
{
	"crop_size": {
	  "height": 336,
	  "width": 336
	},
	"do_center_crop": true,
	"do_convert_rgb": true,
	"do_normalize": true,
	"do_rescale": true,
	"do_resize": true,
	"image_mean": [
	  0.48145466,
	  0.4578275,
	  0.40821073
	],
	"image_processor_type": "CLIPImageProcessor",
	"image_std": [
	  0.26862954,
	  0.26130258,
	  0.27577711
	],
	"processor_class": "LlavaProcessor",
	"resample": 3,
	"rescale_factor": 0.00392156862745098,
	"size": {
	  "shortest_edge": 336
	}
  }
  1. in special_token_map.json add
  "pad_token": {
    "content": "<pad>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },

this was from https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/special_tokens_map.json

  1. tokenizer_config.json change to https://huggingface.co/SurfaceData/llava-v1.6-vicuna-7b-processor/blob/main/tokenizer_config.json
    Diffs:
    "32000": {
      "content": "<image>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "32001": {
      "content": "<pad>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },

and

  "legacy": false,
  "model_max_length": 4096,
  "pad_token": "<pad>",
  "padding_side": "right",
  "processor_class": "LlavaProcessor",

But need to keep the "chat_template" row from original one (vicuna one dont have)

  1. tokenizer.json use the original one BUT in added_tokens, append:
    {
      "id": 32000,
      "content": "<image>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    },
    {
      "id": 32001,
      "content": "<pad>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": true
    }

Thanks, the same applies to solving ”Cannot launch SGLang demo on llava-v1.5-13b“

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants