Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemma-2b static quantized, generate text makes no sense #1853

Open
2 of 4 tasks
CHNtentes opened this issue May 10, 2024 · 0 comments
Open
2 of 4 tasks

gemma-2b static quantized, generate text makes no sense #1853

CHNtentes opened this issue May 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@CHNtentes
Copy link

CHNtentes commented May 10, 2024

System Info

optimum 1.19.1
python 3.8.10
ubuntu 20.04

Who can help?

@michaelbenayoun

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Hi. I tried to convert gemma-2b to onnx format, then quantize it to 8 bit. However, quantized model doesn't generate any useful text, just random characters. I'm not sure what's causing this issue.

Procedure as below:

  1. Use this command to convert gemma-2b to onnx, without kv cache:
    optimum-cli export onnx -m ./gemma-2b --task text-generation --opset 14 --device cpu --trust-remote-code --legacy gemma-2b_onnx_without_past

  2. Quantize onnx model:

from functools import partial
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTQuantizer, ORTModelForCausalLM
from optimum.onnxruntime.configuration import AutoQuantizationConfig, AutoCalibrationConfig

onnx_model = ORTModelForCausalLM.from_pretrained("gemma-2b_onnx_without_past", use_cache=False, use_io_binding=False)
tokenizer = AutoTokenizer.from_pretrained("gemma-2b_onnx_without_past")
decoder_quantizer = ORTQuantizer.from_pretrained(onnx_model)

qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)

def preprocess_fn(ex, tokenizer):
    encoded_inputs = tokenizer(ex["instruction"], return_tensors="pt", padding=True)
    return encoded_inputs

calibration_dataset = decoder_quantizer.get_calibration_dataset(
    "yahma/alpaca-cleaned",
    preprocess_function=partial(preprocess_fn, tokenizer=tokenizer),
    num_samples=100,
    dataset_split="train",
)

calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)

ranges = decoder_quantizer.fit(
    dataset=calibration_dataset,
    calibration_config=calibration_config,
    operators_to_quantize=qconfig.operators_to_quantize,
    use_external_data_format=True,
    batch_size=1,
)

model_quantized_path = decoder_quantizer.quantize(
    save_dir="quantized_gemma",
    calibration_tensors_range=ranges,
    quantization_config=qconfig,
    use_external_data_format=True,
)
  1. Run inference using ORTModelForCausalLM:
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("quantized_gemma")
model = ORTModelForCausalLM.from_pretrained("quantized_gemma", use_cache=False, use_io_binding=False)

query = 'Introduce yourself.'
encoded_inputs = tokenizer(query, return_tensors='pt')

outputs = model.generate(**encoded_inputs, max_new_tokens=64)

response = tokenizer.decode(outputs.tolist()[0])
print(response)

However, in the end the printed output is:

Introduce yourself.. Kids to to to to to loo7777zanie to certitudeBariumBariumToDecimal]]] import.

11ormick de de unintelligiblemiyormiyormiyor Islas of of of of of of of of Animal bourgorm ! XXIV metamor metamorToUpperToUpper CARRAYDOCX

Expected behavior

Here is what I got from gemma-2b onnx (not quantized):

Introduce yourself.
I’m a 20-year-old student from the Netherlands. I’m currently studying at the University of Amsterdam. I’m a student of the Faculty of Social Sciences, and I’m studying International Relations.

What is your current job?

I’m a student.

@CHNtentes CHNtentes added the bug Something isn't working label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant