Issue with saving and loading low bit BLIP-2 model #10892

wayfeng · 2024-04-26T05:59:23Z

The original BLIP2-OPT-6.7B model takes more than 30GB RAM to load and convert. So I want to save the compressed model then load it directly from another PC with limited RAM. The saving succeeded. But loading failed.

from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model

model_id = "Salesforce/blip2-opt-6.7b" # “Salesforce/blip2-opt-2.7b” 
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)

device = 'xpu'
optimized_model = optimize_model(model, device=device)

model_path = “optimized-blip2”
optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)

$ l optimized-blip2 
total 4.7G
drwxrwxr-x 2 wayne wayne 4.0K Apr 25 16:55 .
drwxrwxr-x 6 wayne wayne 4.0K Apr 26 08:40 ..
-rw-rw-r-- 1 wayne wayne   42 Apr 25 16:54 bigdl_config.json
-rw-rw-r-- 1 wayne wayne  942 Apr 25 16:53 config.json
-rw-rw-r-- 1 wayne wayne  136 Apr 25 16:53 generation_config.json
-rw-rw-r-- 1 wayne wayne 446K Apr 25 16:55 merges.txt
-rw-rw-r-- 1 wayne wayne 4.7G Apr 25 16:54 model.safetensors
-rw-rw-r-- 1 wayne wayne  432 Apr 25 16:55 preprocessor_config.json
-rw-rw-r-- 1 wayne wayne  548 Apr 25 16:55 special_tokens_map.json
-rw-rw-r-- 1 wayne wayne  708 Apr 25 16:55 tokenizer_config.json
-rw-rw-r-- 1 wayne wayne 2.1M Apr 25 16:55 tokenizer.json
-rw-rw-r-- 1 wayne wayne 780K Apr 25 16:55 vocab.json

copied_model = load_low_bit(copied_model, model_path)

2024-04-26 08:39:58,752 - INFO - Converting the current model to sym_int4 format......
2024-04-26 08:39:59,115 - ERROR - 

****************************Usage Error************************
Error no file named pytorch_model.bin found in directory optimized-blip2.
2024-04-26 08:39:59,116 - ERROR - 

****************************Call Stack*************************
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[19], line 1
----> 1 copied_model = load_low_bit(copied_model, 'optimized-blip2')

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py:178](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py#line=177), in load_low_bit(model, model_path)
    175     qtype = ggml_tensor_qtype[low_bit]
    176     model = ggml_convert_low_bit(model, qtype=qtype, convert_shape_only=True)
--> 178 resolved_archive_file, is_sharded = extract_local_archive_file(model_path, subfolder="")
    179 if is_sharded:
    180     # For now only shards transformers models
    181     # can run in this branch.
    182     resolved_archive_file, _ = \
    183         get_local_shard_files(model_path,
    184                               resolved_archive_file,
    185                               subfolder="")

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py:83](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py#line=82), in extract_local_archive_file(pretrained_model_name_or_path, subfolder, variant)
     81     return archive_file, is_sharded
     82 else:
---> 83     invalidInputError(False,
     84                       f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}"
     85                       " found in directory"
     86                       f" {pretrained_model_name_or_path}.")

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py:32](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py#line=31), in invalidInputError(condition, errMsg, fixMsg)
     30 if not condition:
     31     outputUserMessage(errMsg, fixMsg)
---> 32     raise RuntimeError(errMsg)

RuntimeError: Error no file named pytorch_model.bin found in directory optimized-blip2.

pengyb2001 · 2024-04-29T01:57:51Z

Hi there, I tried on my arc a770 machine, my env setting is:

transformers=4.31.0
-----------------------------------------------------------------
Name: ipex-llm
Version: 2.1.0b20240421

I first downloaded the Salesforce/blip2-opt-6.7b to my machine

arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/blip2
config.json                       pytorch_model-00003-of-00004.bin  tokenizer_config.json
merges.txt                        pytorch_model-00004-of-00004.bin  tokenizer.json
preprocessor_config.json          pytorch_model.bin.index.json      vocab.json
pytorch_model-00001-of-00004.bin  README.md
pytorch_model-00002-of-00004.bin  special_tokens_map.json

and then used an absolute path to load and transform the model. I did not encounter the issue you mentioned.

from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model

model_id = "/mnt/disk1/models/blip2"  
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)

device = 'xpu'
optimized_model = optimize_model(model, device=device)

model_path = "optimized-blip2"
optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)

arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/optimized-blip2
bigdl_config.json  preprocessor_config.json  tokenizer_config.json
config.json        pytorch_model.bin         tokenizer.json
merges.txt         special_tokens_map.json   vocab.json

You might want to verify that the model you downloaded is complete. And note that you should download the original pytorch_model.
As for loading converted model, I use the following code:

from ipex_llm.optimize import low_memory_init, load_low_bit
from transformers import Blip2Processor, Blip2ForConditionalGeneration
model_id = "/mnt/disk1/models/optimized-blip2"
with low_memory_init():
    model = Blip2ForConditionalGeneration.from_pretrained(model_id)
model = load_low_bit(model, model_id)
print("Model loaded successfully!")

And no error occurred.

arda@arda-arc05:/mnt/disk1/models$ python blip2.py
/opt/anaconda3/envs/mingyu-llm-gpu/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-29 10:47:15,896 - INFO - intel_extension_for_pytorch auto imported
2024-04-29 10:47:16,064 - INFO - Converting the current model to sym_int4 format......
Model loaded successfully!

You can refer to relevant API in ipex-llm/python/llm/src/ipex_llm/optimize.py at main · intel-analytics/ipex-llm (github.com) to write the code of loading.

hkvision assigned pengyb2001 Apr 26, 2024

qiuxin2012 added the user issue label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with saving and loading low bit BLIP-2 model #10892

Issue with saving and loading low bit BLIP-2 model #10892

wayfeng commented Apr 26, 2024

pengyb2001 commented Apr 29, 2024 •

edited

Issue with saving and loading low bit BLIP-2 model #10892

Issue with saving and loading low bit BLIP-2 model #10892

Comments

wayfeng commented Apr 26, 2024

pengyb2001 commented Apr 29, 2024 • edited

pengyb2001 commented Apr 29, 2024 •

edited